Lectures 1-4 Flashcards

1
Q

Frequent Statistics

A

What is the probability of a wrong decision about the treatment effect?
►What should we conclude from the observed data given a specified null hypothesis?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bayesian Statistics

A

What should we believe about the treatment effect given the data that are observed?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Likelihood Inference

A

What is the evidence about the treatment effect given the data that are observed?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biostatistics

A

The science of learning from biomedical data involving appreciable variability or uncertainty
The application of statistical reasoning and methods to the solution of biological, medical, and public health problems
►The scientific use of quantitative information to describe or draw inferences about natural phenomena
►Scientific—accepted theory (ideas) and practice; ethical standards
►Quantitative information—data reflecting variation in populations
►Inference—to conclude or surmise from evidence
14

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Generate hypotheses

A

Ask questions

►Falsifiable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Design and conduct studies to generate evidence

A

Collect data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Descriptive statistics

A

Describe the distributions of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Statistical inference

A

Assess strength of evidence in favor of competing hypotheses
►Use data to update beliefs and make decisions
Also known as confirmatory data analysis (CDA)
►Draw conclusions about a population (whole group; true mechanism) from a sample (representative part of a group; “trials”)
►Assess strength of evidence in support of competing hypotheses
►Make comparisons
►Make decisions
►Make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Design of a Study

A
Ask a precise, testable, and appropriatequestion
►Choose a research approach and design
►Define outcome of interest
►Define comparison groups
►Choose a population to study
►Implementation—collect data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Descriptive Statistics / Exploratory Data Analysis (EDA)

A

Organization and summarization of data
►Graphical display to visualize important patterns and variation
►Hypothesis generating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explanations

A

hypotheses about mechanisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variable

A

a characteristic taking on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simple

A

scientists prefer simple, rather than complex, explanations
►Occam’s razor
►Principle of parsimony

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interrelationships

A

associations; causal connections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variable

A

a characteristic taking on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Random variable

A

a variable for which the values obtained are usually thought of as arising partly as a result of chance factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Response variable (𝒀)

A

the outcome measure; that which may be affected or caused; often a health measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explanatory variables (𝑿)—

A

those that affect or cause the response:
►Treatment (intervention)—explanatory variable that can be controlled by the scientist
►Risk factors—explanatory variables that influence the risk of the outcome; of scientific interest (e.g., smoking, salt intake, environment) and usually cannot be controlled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Quantitative

A

concept of amount; numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Discrete variables

A

gaps in values; e.g., number of births, number of drinks per week

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Continuous variables

A

no gaps in values; e.g., blood pressure, age, height, time to seroconversion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Special case

A

time-to-event data in which we need to deal with “censoring”

4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Qualitative

A

concept of attribute; categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Nominal scale

A

Binary or dichotomous—e.g., disease status (diseased or not diseased), vital status (alive or dead)
●Polychotomous or polytomous—e.g., occupation, marital status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Ordinal or ordered scale

A

e.g., ratings, preferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Variation

A

refers to the differences among a set of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Natural variation

A

differences among persons (experimental units) in the “true” values of the variable of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Measurement variation (or error)

A

differences between the measured and true values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Bias

A

difference between the average (expected) value of a measurement (variable) and the true value that it targets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Variance

A

variation among measurements about their average or mean value, even if that mean differs from the true targeted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Mean Squared Error

A

MSE= variance + bias^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Cause

A

something that brings about an effect or result

33
Q

Confounder

A
another variable (𝑋𝑋2) that needs to be taken into account when assessing the true association between the risk factor 𝑋𝑋1and the outcome 𝑌𝑌
BMI
34
Q

Effect modifier—

A

another variable (𝑋𝑋2) that identifies subgroups of individuals (units) across which the association between the risk factor 𝑋𝑋1and the outcome 𝑌𝑌will differ

35
Q

Inference

A

Estimate the association between the outcome of mortality and treatment, and characterize the estimate’s uncertainty

36
Q

Prediction

A

Best predict the outcome of mortality on the basis of available data of treatment and other factors, and characterize the prediction’s accuracy

37
Q

Experimental studies

A

control allocation of “treatment” to subjects (experimental units)

38
Q

Laboratory studies:

A

control variation (e.g., effect of pesticide on rate of mutations in rat pups)

39
Q

Clinical trials

A

randomize to produce groups with similar observed and unobserved characteristics; average over rather than control variation (e.g., compare two treatments to reduce blood pressure)

40
Q

Observational studies

A

do not control allocation of “treatment” to subjects (experimental units)

41
Q

Frequency

A

the count(frequency) of the number of individuals in a particular group

42
Q

Empirical distribution function

A

a frequency distribution which describes an observed set of values of a variable

43
Q

Cumulative frequency

A
the count (frequency) of the number of individuals in a particular age group or lower age group
►That is, the cumulative count
44
Q

Relative frequency

A

the proportion of individuals in a particular age group = the count (frequency) of the number of individuals in a particular age group divided by the overall total

45
Q

Cumulative relative frequency

A

the cumulative proportion of individuals in a particular age group or any lower age group

46
Q

Range

A

difference between largest and smallest values

47
Q

Variance

A

“average” of the squared differences of observations from the sample mean
𝑠𝑠2=Σi=1n(xi−𝑥𝑥)2𝑛𝑛−1

48
Q

Standard deviation

A

𝑠𝑠=𝑠𝑠2. square root of variance

49
Q

Stats terminology

A

Upper hinge =𝑄𝑄3
►Median=𝑄𝑄2
►Lower hinge=𝑄𝑄1
►Interquartile range (IQR)= 𝑄𝑄3−𝑄𝑄1
●Contains the middle 50% of the observations
►Whiskers: lines drawn to the smallest and largest actual observations within the calculated fences

50
Q

fences

A

Fences are notobserved data points
►Fences are calculated to provide guidelines for identifying outliers
►𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑓𝑓𝑒 𝑒𝑒𝑒𝑒𝑒 =𝑢𝑢𝑝 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ℎ𝑖𝑖𝑖 𝑖𝑖𝑖 +1.5∗𝐼𝐼𝐼 𝐼𝐼=𝑄𝑄3+1.5∗𝐼𝐼𝐼 𝐼𝐼
►𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑓𝑓𝑒 𝑒𝑒𝑒𝑒𝑒 =𝑙𝑙𝑜 𝑜 ℎ𝑖𝑖𝑖 𝑖𝑖𝑖 −1.5∗𝐼𝐼𝐼 𝐼𝐼=𝑄𝑄1−1.5∗𝐼𝐼𝐼 𝐼𝐼

51
Q

outliers

A

Outliers are actual observed data values falling beyond the calculated fences (higher or lower)

52
Q

Positively skewed:

A

more lower values, sparse higher values
►Also: long “tail” of higher values
►Also: mean > median > mode

53
Q

Negatively skewed

A

reverse of positively skewed

54
Q

Symmetric

A

not skewed in either direction

55
Q

Outlying values

A

Values that are “far” from most values

►Importance: a few outlying values can strongly influence certain statistical summary measures and analyses

56
Q

arithmetic scale

A

each increment represents change by a constant amount

57
Q

logarithmic scale

A

each increment represents change by a constant multiplier

58
Q

Probability

A

provides a measure of the uncertainty associated with the occurrence of events

59
Q

Outcome

A

exactly the experiment result

60
Q

Event

A

specific way(s) the experiment can turn out

61
Q

mutually exclusive

A

Two events, A and B, are mutually exclusive if the events cannot occur together

62
Q

statistically independent

A

Two events, A and B, are statistically independent if the probability of A occurring is not influenced by the presence or absence of B

63
Q

conditional probability

A

𝑃𝑃(𝐴𝐴|𝐵𝐵)=𝑃𝑃𝐴𝐴and𝐵𝐵/𝑃𝑃𝐵𝐵, where 𝑃𝑃𝐵𝐵≠0
(Vertical bar | = “given”)
12

64
Q

statistically independent

A

𝑃𝑃(𝐴𝐴|𝐵𝐵)=𝑃𝑃(𝐴𝐴)

►That is, the probability of 𝐴𝐴occurring is not influenced by the presence or absence of 𝐵𝐵

65
Q

Joint probability

A

“and”𝑃𝑃𝐴𝐴and𝐵𝐵

66
Q

Multiplication rule

A

From conditional probability, we can write the joint probability as …

67
Q

mutually exclusive

A

Two outcomes or events are mutually exclusive if and only if the probability of their joint outcome equals zero

68
Q

statistically independent

A

Two outcomes or events are statistically independent if and only if the probability of their joint outcome equals the product of the probabilities of occurrence of each outcome

69
Q

Probability distributions

A

a complete listing of the probabilities for every possible value of a random variable

70
Q

Binomial

A

two possible outcomes
►Underlies much of statistical applications to epidemiology
►Basic model for logistic regression

71
Q

Poisson

A

uses counts of events or rates

►Basis for log-linear and survival models

72
Q

Gaussian (normal) bell-shaped curve

A

means are normally distributed or approximately normally distributed

73
Q

Exponential

A

useful in describing times to events and population growth

74
Q

Counting techniques

A

Factorial
►Permutations
►Combinations

75
Q

Factorial

A

𝑛𝑛factorial” = number of possible arrangements (orderings) of n objects

Notation: “𝑛𝑛factorial” =𝑛𝑛!

76
Q

Permutation

A

ordered arrangement of 𝑛𝑛objects taken 𝑟𝑟at a time

77
Q

Combination

A

a selection of 𝑛𝑛objects taken 𝑟𝑟at a time without regard to order

78
Q

Poisson

A

Describes the totally random (haphazard) occurrences of events in time or objects in space