Biostatistics Flashcards

Question

Chi-square test for trend

Answer 1

A test applied to a two-dimensional contingency table in which one variable has two categories and the other has k ordered categories, to assess whether there is a difference in the trend of the proportions in the two groups. The result of using the ordering in this way is a test that is more powerful than using the chi-squared statistic to test for independence

Answer 2

A research activity that involves the administration of a test regimen to humans to evaluate its efficacy and safety. The term is subject to wide variation in usage, from the first use in humans without any control treatment to a rigorously designed and executed experiment involving test and control treatments and randomization.

Answer 3

Safety and pharmacologic profiles. The first introduction of a candidate vaccine or a drug into a human population to determine its safety and mode of action. In drug trials, this phase may include studies of dose and route of administration. Usually involve fewer than 100 healthy volunteers

Answer 4

Pilot efficacy studies. Initial trial to examine efficacy usually in 200 to 500 volunteers; with vaccines, the focus is on immunogenicity, and with drugs, on demonstration of safety and efficacy in comparison to other existing regimens. Usually but not always, subjects are randomly allocated to study and control groups.

Answer 5

Extensive clinical trial. This phase is intended for complete assessment of safety and efficacy. It involves larger numbers, perhaps thousands, of volunteers, usually with random allocation to study and control groups, and may be a multicenter trial

Answer 6

With drugs, this phase is conducted after the national drug registration authority (e.g., the FDA) has approved the drug for distribution or marketing. Phase IV trials may include research designed to explore a specific pharmacologic effect, to establish the incident of adverse reactions, or to determine the effects of long-term use. Ethical review is required for phase IV clinical trials, but not for routine post marketing surveillance.

Answer 7

The measure of spread for a set of data defined as 100 x standard deviation/mean CV=s/x bar(100)=sample CV=o/miu(100)=population Originally proposed as a way of comparing the variability in different distributions, but found to be sensitive to errors in the mean. The ratio of the standard deviation to the mean. This is meaningful only if the variable is measured on a ratio scale

Answer 8

(concurrent, follow-up, incidence, longitudinal, prospective) The analytic method of epidemiologic study in which subsets of a defined population can be identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors hypothesized to influence the probability of occurrence of a given disease or other outcome

Answer 9

Mutually exclusive events A and B for which Pr(A)+Pr(B)=1 where Pr denotes probability

Answer 10

The probability that an event occurs given the outcome of some other event. usually written, Pr (A|B) For example, the probability of a person being colour blind given that the person is male is about 0.1, and the corresponding probability given that the person is female is approximately .0001. It is not, of course, necessary that Pr(A|B)=Pr(A|B); the probability of having spots given that a patient has measles, for example, is very high, the probability of measles given that a patient has spots is, however much less. If Pr(A|B)=Pr(A|B) then the events A and B are said to be independent.

Answer 11

A range of values, calculated from the sample observations, that is believed, with a particular probability, to contain the true value of a population parameter. E.g. A 95% confidence interval implies that were the estimation process repeated again and again, then 95% of the calculated intervals would be expected to contain the true parameter value.

Answer 12

(confounding factor, lurking variable, a confound, confounder) an extraneous variable in a statistical model that correlates with both the dependent variable and the independent variable. The methodologies of scientific studies need to control for these factors to avoid a type 1 error: a false positive conclusion that the dependent variables are in a causal relationship with the independent variable. Confounding is a major threat to the validity of inferences made about cause and effect i.e. internal validity as the observed effects should be attributed to the confounder rather than the independent variable. A confounding variable is associated with both the probable cause and the outcome. *SEE PARAGRAPH

Answer 13

The table arising when observations on a number of categorical variable are cross-classified. Entries in each cell are the number of individuals with the corresponding combination of variable values. Most common are two-dimensional tables involving two categorical variables. *SEE EXAMPLE. The analysis of such two-dimensional tables generally involves testing for the independence of the two variables using chi-squared statistics

Answer 14

result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions or jumps, e.g. blood pressure

Answer 15

A phase III clinical trial in which an experimental treatment is compared with a control treatment, the latter being either the current standard treatment or a placebo

Answer 16

An index that quantifies the linear relationship between a pair of variables. In a bivariate normal distribution, for example, the parameter, p. An estimator of P obtained from n sample values of the two variable of interest is Pearson's product moment correlation coefficient, r, given by *SEE EQUATION. The coefficient takes values between -1 and 1, with the sign indicating the direction fo the relationship and the numerical magnitude its strength. Values of -1 and 1 indicate that the sample values fall on a straight line. A value of zero indicates the lack of any linear relationship between the two variables.

Answer 17

Often used simply as an alternative name for explanatory variables, but perhaps more specifically to refer to variables that are not of primary interest in an investigation, but are measured because it is believed that they are likely to affect the response variable and consequently need to be included in analyses and model building

Answer 18

A statistical model use din survival analysis developed by D.R. Cox in 1972 asserting that the effect of the study factors on the hazard rate in the study population is multiplicative and does not change over time

Answer 19

The value with which a statistic calculated from sample data is compared in order to decide whether a null hypothesis should be rejected. The value is related to the particular significance level chosen.

Answer 20

The proportion of patients in a clinical trial transferring from the treatment decided by an initial random allocation to an alternative one

Answer 21

A study that examines the relationship between diseases or other health-related characteristics and other variables of interest as they exist in defined population at one particular time

Answer 22

The tabulation of a sample of observations in terms of numbers falling below particular values. The empirical equivalent of the cumulative probability distribution. *SEE TABLE

Answer 23

the number of independent units of information in a sample relevant to the estimation of a parameter or calculation of a statistic. FOr example, in a two-by-two contingency table with a given set of marginal totals, only on e of the four cell frequencies is free and the table has therefore a single degree of freedom. IN many cases the term corresponds to the number of parameters in a model.

Answer 24

The variable of primary importance in investigations since the major objective is usually to study the effects of treatment and/or other explanatory variables on this variable and to provide suitable models for the relationship between it and the explanatory variables.

Answer 25

general term for methods of summarizing and tabulating data that make their main features more transparent. E.g. calculating means and variances and plotting histograms

Answer 26

A nominal measure with two outcomes (male or female, survival yes or no). also called binary

Answer 27

one that arranges items into either of two mutually exclusive categories, e.g. yes/no, alive/dead.

Answer 28

result when the number of possible values is either a finite number or a countable number

Answer 29

a countable and finite variable, e.g. grade

Answer 30

any finite or infinite collection of units, which are often people but may be for example institutions, events, etc.

Answer 31

a procedure used in clinical trials to avoid the possible bias that might be introduced if the patient and/or doctor knew which treatment the patient is receiving. Neither patient nor doctor are aware of which treatment has been given the trial is termed double-blind

Answer 32

provides one way of using categorical predictor variables in various kinds of estimation models (effect coding) such as linear regression. Dummy coding uses only ones and zeros to convey all of the necessary information on group membership

Answer 33

a variable taking only on e of two possible values, one (usually 1) indicating the presence of a condition, and the other (usually 0) indicating the absence of the condition, used mainly in regression analysis

Answer 34

a measure of the strength of the relationship between two variables. In scientific experiments, it is often usefult o know not only whether an experiment has a statistically significant effect, but also the size of any observed effects. In practical situations, effect sized are helpful for making decisions. Effect size measures are the common currency of meta-analysis studies that summarize the findings from a specific area of research

Answer 35

the sample size after dropouts, deaths and other specified exclusions from the original sample

Answer 36

usually encountered in the analysis of contingency tables. Such frequencies are estimates of the values to be expected under the hypothesis of interest. IN a two-dimensional table, for example, the values under independence are calculated from the product of the appropriate row and column totals divided by the total number of observations

Answer 37

involves performing a number of trials to measure the chance of the occurrence of an event or outcome

Answer 38

a study in which the investigator intentionally alters one or more factors under controlled conditions in order to study the effects of doing so

Answer 39

a study in which conditions are under the direct control of the investigator. In epidemiology, a study in which a population is selected for a planned trial of a regimen whose effects are measured by comparing the outcome of the regimen in the experimental group with the outcome of another regimen in a control group

Answer 40

the variables appearing on the right-hand side of the equations defined, for example, multiple regression or logistic regression, and which seek to predict or explain the response variable. ALso commonly known as the independent variables

Answer 41

an event, characteristic, or other definable entity that brings about a change in a health condition or other defined outcome

Answer 42

a set of statistical methods for analyzing the correlations among several variables in order to estimate the number of fundamental dimensions that underlie the observed data and to describe and measure those dimensions. Used frequently in the development of scoring systems for rating scales and questionnaires.

Answer 43

designs which allow two or more questions to be addressed in an investigation. The simplest factorial design is one in which each of two treatments or interventions are either present or absent, so that subjects are divided into four groups; those receiving neither treatment, those having only the first treatment, those having only the second treatment and those receiving both treatments. Such designs enable possible interactions between factors to be investigated. A very important special case of a factorial design is that where each of k factors of interest has only two levels; these are usually know as 2k factorial designs.

Answer 44

the proportion of cases in which a diagnostic test indicates disease is absent in patients who have the disease

Answer 45

the proportion of cases in which a diagnostic test indicates disease is present in disease-free patients

Answer 46

the distribution of the ratio of two independent quantities each of which is distributed like a variance in normally distributed samples. So named in honor of R.A. Fisher

Answer 47

An alternative procedure to use of the chi-squared statistic for assessing the independence of two variables forming a tow-by-two contingency table particularly when the expected frequencies are small. The method consists of evaluating the sum of the probabilities associated with the observed table and all possible two-by-two tables that have the same row and column totals as the observed data but exhibit more extreme departure from independence. Hypergeometric distribution

Answer 48

A transformation of Pearson's product moment correlation coefficient, r given by *SEE EQUATION

Answer 49

the frequency or occurrence of a disease and other attribute or event in a population without distinguishing between incidence and prevalence

Answer 50

lists data values (individually or by groups of intervals), along with their corresponding frequencies (or counts)

Answer 51

a way of summarizing data; used as a record of how often each value or set of values of a variable occurs. Used to summarize categorical, nominal, and ordinal data. It may also be used to summarize continuous data once the data is divided into categoriw

Answer 52

a test for the equality of the variances of two populations having normal distributions, based on the ratio of the variances of a sample of observations taken from each. Most often encountered in the analysis of variance, where testing whether particular variances are the same also test for the equality of a set of means.

Answer 53

normal distribution

Answer 54

a term usually retained for those clinical trials in which there is a random allocation to treatments, a control group and double-blinding

Answer 55

degree of agreement between an empirically observed distribution and a mathematical or theoretical distribution

Answer 56

a statistical test of the hypothesis that data have been randomly samples or generated from a population that follows a particular theoretical distribution or model. the most common such tests are chi-square tests

Answer 57

inherent capability of an agent or situation to have an adverse effect. A factor or exposure that may effect adversely effect health

Answer 58

a theoretical measure of the risk of an occurrence of an event, e.g. death or new disease, at a point in time, t, defined mathematically as the limit, as delta t approaches zero, or the probability that an individual well at time t will experience the event by t +delta t, divided by delta t

Answer 59

A graphical representation of a set of observations in which class frequencies are represented by the areas of rectangles centred on the class interval. If the latter are all equal, the heights of the rectangles are also proportional to the observed frequencies.

Answer 60

a group of patients treated in the past with a standard therapy, used as the control group for evaluating a new treatment on current patients. ALthough used fairly frequently in medical investigations, the approach is not to be recommended since possible biases, due to other factors that may have changed over the time, can never be satisfactorily eliminated

Answer 61

a term that is used to indicate the equality of some quantity of interest (most often a variance), in a number of different groups, populations, etc.

Answer 62

the contancy of the variance of a measure over levels of the factors under study

Answer 63

procedure of assessing whether sample data is consistent or otherwise with statements made about the population

Answer 64

a measure of the rate at which people without a disease develop the disease during a specific period of time. Incidence=# of new cases of a disease over a period of time/population at risk of disease in the time period. It measures the appearance of disease. # of new events, #new cases of disease in a specified population within a specified period of time. Incidence is not incidence rate

Answer 65

two events are said to be independent if the occurrence of one is in no way predictable from the occurrence of the other. Two variables are said to be independent if the distribution of values of one is the same for all the values of the other

Answer 66

variables appearing ont he right-hand side of the equations defining multiple regression or logistic regression, and which seek to predict or explain the response variable

Answer 67

the process of drawing conclusions about a population on the basis of measurements or observations made on a sample of individuals for the population

Answer 68

a term applied when two or more explanatory variables do not act independently on a response variable. *SEE GRAPHIC

Answer 69

the parameter in an equation derived from a regression analysis corresponding to the expected value of the response variable when all the explanatory variables are zero.

Answer 70

a measure of spread giving by the difference between the first and third quartile of a sample

Answer 71

the degree of agreement among raters. It gives a score of how much homogeneity or consensus there is in the rating given by judges. It is useful in refinind the tools given to human judges, for example by determining if a particular scale is appropriate for measure a particular variable. If various raters do not agree, either the scale is defective or the raters eed to be re-trained. There are a number of statistics which can be used to idetemine inter-rater reliability. Idfferent statistics are appropriate for different types of measurement. Som eoptions are: joint-probability

Answer 72

conditions are undre the direct control of the investigator. IN epidemiology, a study in which a population is selected for a planned trial of a regimen whose effects are measured by comparing the outcomes of the region in the experimental group with the outcome of another regiment in a control group.

Answer 73

a nonparametric method of compiling life or survival tables. This combines calculated probabilities of survival and estimates to allow for censored observations, which are assumed to occur randomly. The intervals are defined as ending each time an event (death, withdrawal) occurs and are therefore unequal

Answer 74

a measure of the degree of nonrandom agreement between observers or measurements of the same categorical variable *SEE EQUATION

Answer 75

the extent to which a unimodal distribution is peaked

Answer 76

a principle of estimation, attributable to Gauss, in which the estimates of a set of parameters in a statistical model are those quantities that minimize the sum of squared differences between the observed values of the dependent variable and the values predicted by the model

Answer 77

the level of probability at which it is agreed that the null hypothesis will be rejected. Conventionally set at 0.05.

Answer 78

a procedure often applied in prospective studies to examine the distribution of mortality and/or morbidity in one or more diseases in a cohort study of patients over a fixed period of time. For each specific increment in the follow-up period, the number entering the period, the number leaving during the period, and the number either dying from the disease or developing the disease, are all calculated. It is assumed that an individual not completed the follow-up period is exposed for half this period, thus enabling the data for those 'leaving' and those 'staying' to be combined into an appropriate denominator for the estimation of the percentage dying from or developing the disease. The advantage of this approach is that all patients, not only those hwo have been involved for an extended period, can be included in the estimation process.

Answer 79

a function constructed from a statistical model and a set of observed data that gives the probability of the observed data for various values of the unknown model parameters. The parameter values that maximize the probability are the maximum likelihood estimates of the parameters.

Answer 80

the ratio of the likelihood of observing data under actual conditions, to observing these data under the other, e.g., 'ideal' conditions; or comparison of various model conditions to assess which model provides the best fit. LIkelihood ratios are used to appraise screening and diagnostic tests in clinical epidemiology

Answer 81

a statistical test based on the ratio of the maximum value under another statistical model; the models differ in that one includes and the other excludes one or more parameters.

Answer 82

a form of regression analysis in which observational data are modeled by a function which is a linear combination of the model parameters and depends on one or more independent variables. In simple linear regression the model function represents a straight line. THe results of data fitting are subject to statistical analysis. THe data consist of m values taken from observations of the dependent variable (response variable) y. The independent variables are also called regressors, exogenous variables, input variables and predictor variables. In simple linear regression the data model is written *SEE EQUATION

Answer 83

a form of regression analysis used when the response variable is a binary variable. The method is based on the logistic transformation or logit of a proportions, namely *SEE EQUATION.

Answer 84

a statistical model of an individual's risk (probability of disease y) as a function of a risk factor x: *SEE EQUATION

Answer 85

the logarithm of the ratio of frequencies of two different categorical outcomes such as healthy vs sick

Answer 86

a linear model for the logit (natural log of the odds) of disease as a function of a quantitative factor X: *SEE EQUATION. This model is mathematically equivalent to the logistic model.

Answer 87

a statistical model that uses an analysis of variance type of approach for the modeling of frequency counts in contingency tables

Answer 88

a test for comparing two or more sets of survival times, to assess the null hypothesis that ther is no difference in the survival experience of the individuals in the different groups.

Answer 89

Subjects are measured repeatedly through time.

Biostatistics Flashcards

(113 cards)