Module 1 Flashcards

1
Q

What is a population? What does the population contain?

A
  • It’s everything you care about
  • Could be a group of existing individuals/objects
  • could be a hypothetical and potentially infinite group of individuals/objects

The populations contains the TRUTH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a sample?

A
  • It’s a subset of everything you care about
  • We make inferences about the population from the sample because it is often impossible to measure everything about a population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What two things should samples be?

A
  1. Random
  2. Representative(i.e. accurately portray the distribution of the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

We use samples to __________(1) about populations. Assuming you know everything about the population, then ______(2) will help us understand something about the sample.

A

(1) make inferences
(2) probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three steps of the scientific method?

A
  1. Define prior beliefs: These are competing hypotheses(H0, H1, H2, etc.) about the state of the universe(i.e., “population”). Before collecting any data, what do I believe about the world?
  2. Experiment: Design an experiment to generate objective data(i.e. take a SAMPLE from the population) to learn about the state of the universe.
  3. Use evidence to update prior beliefs: The data will support some hypotheses more than others. Make claims about the population based on the sample. Depending on the strength of the prior belief, you may need more or less compelling data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the likelihood function? What is the notation?

A

The likelihood function gives the probability of the observed outcome for a particular value of the unknown truth; It is the measure of the quantitive evidence about that truth.

Ex. What is the probability of getting 3 heads if the coin has 0 heads –> the answer is 0

NOTATION:
P(HHH I H1 is true) = —
What is the probability of 3 heads if H1 is true?

The likelihood function helps us evaluate the probability of our hypotheses now that we have collected some data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Evidence is always ______(1): It supports one hypothesis ______(1) to another. As such, ______(1) likelihood is more important. Evidence is used to update _______________(2).

A

(1) Relative
(2) Prior beliefs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When drawing a “likelihood of experimental result” graph, what are the x and y axis labels. Do you connect the data points with a line?

A

x axis: Hypothesized truth
- Label the axis as H0, H1, H2
Ex. Hypothesized truth: # of heads on coin

y axis: Likelihood of data
- Choose ONE outcome and use appropriate notation
- use decimals form 0.00-1.00
Ex. Likelihood of data: P(HHH)

You do not connect the data points with the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What information can you take away from a “likelihood of experimental result” graph?

A

Based on the hypothesis that has the highest probability(i.e. the probability that maximizes our likelihood), we can determine the unknown truth of the world that would make the data that we observed most likely to occur. It doesn’t mean its for sure what happened, but the data would most likely occur if this hypoehts is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for Bayes theorem?

A

Posterior odds = (prior odds) x (likelihood ratio)

P(H2 I Data) P(H2) P(Data I H2)
—————— = ——— X ——————
P(H1 I Data) P(H1) P(Data I H1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are odds? What do odds of greater than or less than one indicate about the events?

A
  • Odds are a way to express the likelihood that an event occurs
  • If odds>1 then the top event(H2) is more likely
  • If odds<1, the the bottom event(H1) is more likely
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you convert from odds to probability?

A

Prob = (odds)/(1+odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two types of probability?

A
  1. FREQUENTIST: long term frequency
    - Consider events over a long period of time and based on the frequency of their occurrence, we can infer something about the probability of each event.
    - Repeat same experiment over and over and look at the proportion of times the event happens
    - Examples: Coin tosses(probability coin lands “heads”), disease prevalence(probability a randomly selected person has the disease)
  2. BAYESIAN(subjectivist): measure of personal belief
    - Does not mean that its not informed by data
    - Start by defining prior belief(established without any information to back it up) and then use the observed data to update our prior belief to form our posterior belief.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculate P(A and B)?

A

P(A and B) = P(A) x P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate P(A or B)?

A

P(A) + P(B) (assuming the events are mutually exclusive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it means if two events are mutually exclusive?

A

It means hat the two events cannot occur a the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does it mean if two events are independent?

A

Events are independent if knowing whether one occurred tells you nothing about whether the other one occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Probability assigns ________(1) to any set of possible events(outcomes) of an experiment.

A

(1) a number in [0,1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is P(11) in the same of the faces of a toss of two fair, standard dice?

A

Possible outcomes: (1, 2, 3, 4, 5, 6) X (1, 2, 3, 4, 5, 6) = 36 posible outcomes

Based on the physical model of the mechanic, there is a 1/36 chance of getting a particular combination

P(11) = P(5,6) OR (P6,5) = (1/36) + (1/36) = 1/18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

We can use physical models to determine probability. Give an example of how this could apply to a coin toss?

A

When we toss a (fair) coin, there are two equally likely possible outcomes: heads(H) or tails(T)

By the physical model of the mechanism(a fair coin):
- Each of the two outcomes is equally likely, so P(H) = P(T)
- One of those outcomes must occur, so P(H) + P(T) = 1
- Therefore, P(H) = P(T) = 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a random variable?

A

A random variable is a numeric function of the outcomes of an experiment
- Ex. Flip a coin 5 times. Let X=number of heads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a discrete probability function?

A

A discrete probability function describes the probabilities associated with each possible value of the discrete random variable.

You make a chart with the first row having all the possible values of the discrete random variable . The second row contains the probability that the random variable equals each of those possible values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A random variable is discrete if _______.

A

it can only assume a countable number of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are joint probability distributions?

A
  • Join probability distributions describe how the outcomes of two experiments behave together(considering two experiments at the same time)
  • We summarize joint behaviour in a two-way contingency table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a joint probability?

A

The probability of 2 outcomes from 2 different experiments occurring at the same time

  • Always dividing by the total number of people in the whole experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a marginal probability?

A

Consider only one of the outcomes, the other is not known. looking at only one probability in a contingency table

  • Always dividing by the total number of people in the whole experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a conditional probability? What is the formula for calculating conditional probability?

A

We want to know the probability of one outcome in one experiment given the outcome of the other experiment.

P(A I B) = P(A and B) / P(B)

Note: to calculate P(A and B) here, dont multiple the probabilities, just determine the value from the table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is relative risk/risk ratio? How would you calculate the relative risk of A compared to B?

A

Allows us to compare conditional probabilities.

To calculate the relative risk of A compared to B = risk of /Risk of B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What type of probability are sensitivity and specificity?

A

Both sensitivity and specificity are condition probabilities that depend on knowing the true disease status.

30
Q

What type of probability are PPV and NPV?

A

Both are conditional probabilities that depend on knowing the status of the test.

31
Q

Are sensitivity and specificity prior or posterior beliefs? What about PPV and NPV?

A

Sensitivity and specificity are prior beliefs. PPV and NPV are posterior beliefs

32
Q

How can you reduce the probability of a false negative?

A

Increase sensitivity

33
Q

How can you reduce the probability of a false positive

A

Increase specificity

34
Q

What is a random variable?

A

A numeric function of the outcomes of an experiment

35
Q

What does it mean if aa random variable is discrete? What is a discrete probability function?

A

A random variable is discrete if it can only assume a countable number of possible values. A discrete probability function describes how to calculate probabilities about a discrete random variable

36
Q

What do measures of central tendency tell us? List the two measures of central tendency.

A

Centra tendency tells us what a typical value from the distribution looks like. The two measures of central tendency are mean and median

37
Q

What do measures of spread/variability tell us? List the two measures of spread/variability.

A

Measures of spread/variability tell us the degree of dispersion/noise in the data. The two measure of spread/variability are variance and standard deviation

38
Q

Why is it useful to know the shape of a distribution? What are the three shapes that a distribution can take?

A

It is useful for describing how values are apportioned(i.e. divided) throughout the range of the distribution. The three shapes are symmetric, positively skewed(right skew), negatively skewed(left skew).

39
Q

What are the three characteristics that can be used to describe a distribution?

A
  1. Measure of central tendency
  2. Measures of spread/variability
  3. Types of shapes
40
Q

What is the definition of the mean/average? What is another term used to describe the mean of a distribution?

A

The mean is the weighted average of possible values, weighted by probabilities. It is also called the “expected value” of the distribution.

Formula sheet has the formula.

41
Q

Does the average have to be an observable value in the dataset?

A

No

42
Q

What is the median? What two conditions have to be met for a value to be the median?

A

The median is the “middle” value(or interval). It divides possible values into two pieces of equal probability. For discrete probability distributions, this means that the medical can be a single value in the dataset or it can be an interval

The two conditions are listed on the notes sheet.

43
Q

What is the variance, and what does the value tell us?

A

Variance is the average weighted squared difference between. value and the mean. The variance quantifies the degree to which values in the distribution can vary. The value of variance is not interpretable in and of itself. This is because the units of variance are units squared.

Formula listed in formula sheet

44
Q

What is the standard deviation, and what does the value tell us? What is a disadvantage of variance? What is the formula for standard deviation?

A

The standard deviation is the square-root of variance. It removes the “squared units”, which gives use a more directly understandable measure of the dispersion of the data.

It is roughly the average distribution of the random variable form the mean of the distribution. The greater the standard deviation, the greater the variability in distribution values. SD can be influence by extreme values.

Formula is in the notes sheet

45
Q

Draw a symmetric distribution shape look like? How do the mean and median compare?

A

Refer to notes sheet for drawing.

For a symmetric distribution, the mean=median generally, but it is better to say that the mean is.a possible median(in case the median is an interval

46
Q

Draw a positively skewed(right) distribution. How do the mean and median compare?

A

Refer to notes for drawing.

mean>median

47
Q

Draw a negatively skewed distribution(left distribution). How do the mean and median compare?

A

Refer to notes for drawing

mean<median

48
Q

When is the mean a better measure of a “typical” value?

A

Symmetric distribution
- Mean and median are the same/close and the mean has nicer mathematical properties

49
Q

When is the median a better measure of a typical value?

A

Skewed distribution or presence of extreme values
- The mean is more sensitive to extreme values

50
Q

The interquartile range defines ______(1). The median ____(2). Each contain ____(3).

Draw this observation on the box plot

A

(1) the central 50% of the observations
(2) divides the distribution in half
(3) half the observation in the distribution

Refer to notes for drawing

51
Q

A box plot reflections the _____(1) of the distribution.

A

(1) symmetry/asymmetry

52
Q

The box plot provides much of the same info about the distribution shape as the histograms, BUT additionally ___(1).

A

(1) provides insight into the distribution milestone that the histogram does not

53
Q

Draw a box plot for a symmetric distribution and label the whiskers, third/upper quartile, median, lower quartile, and the interquartile range.

A

Refer to notes

54
Q

Draw the box plot for a positively skewed distribution. Label the positive tail.

A

Refer to notes

55
Q

Draw the box plot for a negatively skewed distribution. Label the negative tail.

A

Refer to notes

56
Q

Outliers extend ___(1) and re represented by ___(2)

A

(1) outside the whiskers
(2) dots

57
Q

What do whiskers represent?

A

Whiskers extend to the most extreme data point, which is no further than one and a half times the interquartile range form the box

58
Q

What does the third/upper quartile represent?

A

75% of the observations in the distribution have a value that is less than or equal to the upper quartile

59
Q

What does the lower quartile represent?

A

25% of the observations in the distribution have a value that is less than or equal to the lower quartile.

60
Q

What does the interquartile range represent?

A

50% of the observations have a value beteween the lower and upper quartiles.

61
Q

What are the three assumptions of the binomial distribution?

A
  1. There must be a fixed number of trials
  2. The probability of success son a single trial must be constant
  3. Th trials must be independent of each other
62
Q

What is the notation for a binomial distribution?

A

X ~ Binomial(n,p)

X is distributes as a binomial variable with parameters n & p

63
Q

On the formula for a binomial distribution, label x, n, combination, probability of success, probability of failure, # of trials that were not successes.

A

Refer to notes page.

64
Q

How can you calculate mean, variance, and standard deviation knowing the parameters for a binomial distribution?

A

Mean : E[X] = np

Variance: Var(X) = np(1-p)

SD: SD(X) = Sqrroot(np(1-p))

65
Q

What does the cdf function on the calculator tell us

A

the probability of “c or fewer” successes

66
Q

Does the median need to be an observable value in the dataset?

A

yes

67
Q

When do you use binomial likelihood?

A

When our observed data is generated from a binomial distribution, but you dont know the value of p

68
Q

What is the maximum likelihood function?

A

The value of p that maximizes the likelihood function.

69
Q

What is the formula for binomial likelihood? What is the process for determining the maximum likelihood function?

A

L(p) ? p^x (1-p)^(n-x)

Sub in different value for p and determine which has the highest L(p)

70
Q

How do you calculate odds?

A

The probability of an event occurring over it not occurring.

71
Q

How can you interpret risk ratio?

A

If 0≤ RR<1
PERCENT DECREASE
To calculate % decrease: (1-RR) x 100%

If 1<RR<2
PERCENT INCREASE
To calculate % increase: (RR-1) x 100%

If RR≥ 2
FOLD/MULTIPLICATIVE INREASE
Calculation: ratio