Lectures 1-4 Flashcards

1
Q

Frequent Statistics

A

What is the probability of a wrong decision about the treatment effect?
►What should we conclude from the observed data given a specified null hypothesis?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bayesian Statistics

A

What should we believe about the treatment effect given the data that are observed?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Likelihood Inference

A

What is the evidence about the treatment effect given the data that are observed?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biostatistics

A

The science of learning from biomedical data involving appreciable variability or uncertainty
The application of statistical reasoning and methods to the solution of biological, medical, and public health problems
►The scientific use of quantitative information to describe or draw inferences about natural phenomena
►Scientific—accepted theory (ideas) and practice; ethical standards
►Quantitative information—data reflecting variation in populations
►Inference—to conclude or surmise from evidence
14

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Generate hypotheses

A

Ask questions

►Falsifiable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Design and conduct studies to generate evidence

A

Collect data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Descriptive statistics

A

Describe the distributions of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Statistical inference

A

Assess strength of evidence in favor of competing hypotheses
►Use data to update beliefs and make decisions
Also known as confirmatory data analysis (CDA)
►Draw conclusions about a population (whole group; true mechanism) from a sample (representative part of a group; “trials”)
►Assess strength of evidence in support of competing hypotheses
►Make comparisons
►Make decisions
►Make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Design of a Study

A
Ask a precise, testable, and appropriatequestion
►Choose a research approach and design
►Define outcome of interest
►Define comparison groups
►Choose a population to study
►Implementation—collect data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Descriptive Statistics / Exploratory Data Analysis (EDA)

A

Organization and summarization of data
►Graphical display to visualize important patterns and variation
►Hypothesis generating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explanations

A

hypotheses about mechanisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variable

A

a characteristic taking on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simple

A

scientists prefer simple, rather than complex, explanations
►Occam’s razor
►Principle of parsimony

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interrelationships

A

associations; causal connections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variable

A

a characteristic taking on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Random variable

A

a variable for which the values obtained are usually thought of as arising partly as a result of chance factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Response variable (𝒀)

A

the outcome measure; that which may be affected or caused; often a health measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explanatory variables (𝑿)—

A

those that affect or cause the response:
►Treatment (intervention)—explanatory variable that can be controlled by the scientist
►Risk factors—explanatory variables that influence the risk of the outcome; of scientific interest (e.g., smoking, salt intake, environment) and usually cannot be controlled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Quantitative

A

concept of amount; numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Discrete variables

A

gaps in values; e.g., number of births, number of drinks per week

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Continuous variables

A

no gaps in values; e.g., blood pressure, age, height, time to seroconversion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Special case

A

time-to-event data in which we need to deal with “censoring”

4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Qualitative

A

concept of attribute; categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Nominal scale

A

Binary or dichotomous—e.g., disease status (diseased or not diseased), vital status (alive or dead)
●Polychotomous or polytomous—e.g., occupation, marital status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Ordinal or ordered scale
e.g., ratings, preferences
26
Variation
refers to the differences among a set of measurements
27
Natural variation
differences among persons (experimental units) in the “true” values of the variable of interest
28
Measurement variation (or error)
differences between the measured and true values
29
Bias
difference between the average (expected) value of a measurement (variable) and the true value that it targets
30
Variance
variation among measurements about their average or mean value, even if that mean differs from the true targeted value
31
Mean Squared Error
MSE= variance + bias^2
32
Cause
something that brings about an effect or result
33
Confounder
``` another variable (𝑋𝑋2) that needs to be taken into account when assessing the true association between the risk factor 𝑋𝑋1and the outcome 𝑌𝑌 BMI ```
34
Effect modifier—
another variable (𝑋𝑋2) that identifies subgroups of individuals (units) across which the association between the risk factor 𝑋𝑋1and the outcome 𝑌𝑌will differ
35
Inference
Estimate the association between the outcome of mortality and treatment, and characterize the estimate’s uncertainty
36
Prediction
Best predict the outcome of mortality on the basis of available data of treatment and other factors, and characterize the prediction’s accuracy
37
Experimental studies
control allocation of “treatment” to subjects (experimental units)
38
Laboratory studies:
control variation (e.g., effect of pesticide on rate of mutations in rat pups)
39
Clinical trials
randomize to produce groups with similar observed and unobserved characteristics; average over rather than control variation (e.g., compare two treatments to reduce blood pressure)
40
Observational studies
do not control allocation of “treatment” to subjects (experimental units)
41
Frequency
the count(frequency) of the number of individuals in a particular group
42
Empirical distribution function
a frequency distribution which describes an observed set of values of a variable
43
Cumulative frequency
``` the count (frequency) of the number of individuals in a particular age group or lower age group ►That is, the cumulative count ```
44
Relative frequency
the proportion of individuals in a particular age group = the count (frequency) of the number of individuals in a particular age group divided by the overall total
45
Cumulative relative frequency
the cumulative proportion of individuals in a particular age group or any lower age group
46
Range
difference between largest and smallest values
47
Variance
“average” of the squared differences of observations from the sample mean 𝑠𝑠2=Σi=1n(xi−𝑥𝑥)2𝑛𝑛−1
48
Standard deviation
𝑠𝑠=𝑠𝑠2. square root of variance
49
Stats terminology
Upper hinge =𝑄𝑄3 ►Median=𝑄𝑄2 ►Lower hinge=𝑄𝑄1 ►Interquartile range (IQR)= 𝑄𝑄3−𝑄𝑄1 ●Contains the middle 50% of the observations ►Whiskers: lines drawn to the smallest and largest actual observations within the calculated fences
50
fences
Fences are notobserved data points ►Fences are calculated to provide guidelines for identifying outliers ►𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑓𝑓𝑒 𝑒𝑒𝑒𝑒𝑒 =𝑢𝑢𝑝 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ℎ𝑖𝑖𝑖 𝑖𝑖𝑖 +1.5∗𝐼𝐼𝐼 𝐼𝐼=𝑄𝑄3+1.5∗𝐼𝐼𝐼 𝐼𝐼 ►𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑓𝑓𝑒 𝑒𝑒𝑒𝑒𝑒 =𝑙𝑙𝑜 𝑜 ℎ𝑖𝑖𝑖 𝑖𝑖𝑖 −1.5∗𝐼𝐼𝐼 𝐼𝐼=𝑄𝑄1−1.5∗𝐼𝐼𝐼 𝐼𝐼
51
outliers
Outliers are actual observed data values falling beyond the calculated fences (higher or lower)
52
Positively skewed:
more lower values, sparse higher values ►Also: long “tail” of higher values ►Also: mean > median > mode
53
Negatively skewed
reverse of positively skewed
54
Symmetric
not skewed in either direction
55
Outlying values
Values that are “far” from most values | ►Importance: a few outlying values can strongly influence certain statistical summary measures and analyses
56
arithmetic scale
each increment represents change by a constant amount
57
logarithmic scale
each increment represents change by a constant multiplier
58
Probability
provides a measure of the uncertainty associated with the occurrence of events
59
Outcome
exactly the experiment result
60
Event
specific way(s) the experiment can turn out
61
mutually exclusive
Two events, A and B, are mutually exclusive if the events cannot occur together
62
statistically independent
Two events, A and B, are statistically independent if the probability of A occurring is not influenced by the presence or absence of B
63
conditional probability
𝑃𝑃(𝐴𝐴|𝐵𝐵)=𝑃𝑃𝐴𝐴and𝐵𝐵/𝑃𝑃𝐵𝐵, where 𝑃𝑃𝐵𝐵≠0 (Vertical bar | = “given”) 12
64
statistically independent
𝑃𝑃(𝐴𝐴|𝐵𝐵)=𝑃𝑃(𝐴𝐴) | ►That is, the probability of 𝐴𝐴occurring is not influenced by the presence or absence of 𝐵𝐵
65
Joint probability
“and”𝑃𝑃𝐴𝐴and𝐵𝐵
66
Multiplication rule
From conditional probability, we can write the joint probability as …
67
mutually exclusive
Two outcomes or events are mutually exclusive if and only if the probability of their joint outcome equals zero
68
statistically independent
Two outcomes or events are statistically independent if and only if the probability of their joint outcome equals the product of the probabilities of occurrence of each outcome
69
Probability distributions
a complete listing of the probabilities for every possible value of a random variable
70
Binomial
two possible outcomes ►Underlies much of statistical applications to epidemiology ►Basic model for logistic regression
71
Poisson
uses counts of events or rates | ►Basis for log-linear and survival models
72
Gaussian (normal) bell-shaped curve
means are normally distributed or approximately normally distributed
73
Exponential
useful in describing times to events and population growth
74
Counting techniques
Factorial ►Permutations ►Combinations
75
Factorial
𝑛𝑛factorial” = number of possible arrangements (orderings) of n objects ► Notation: “𝑛𝑛factorial” =𝑛𝑛!
76
Permutation
ordered arrangement of 𝑛𝑛objects taken 𝑟𝑟at a time
77
Combination
a selection of 𝑛𝑛objects taken 𝑟𝑟at a time without regard to order
78
Poisson
Describes the totally random (haphazard) occurrences of events in time or objects in space