4 - In All Probability Flashcards

Question

How do you calculate P(E)?

Answer 1

Sum of probabilities of testing positive from both having and not having the disease

Answer 2

The probability that the test is positive when the subject has the disease

Answer 3

The probability that the test is negative when the subject does not have the disease

Answer 4

P (C1) × P (H3|C1) = 1/3 × 1/2 = 1/6

Answer 5

P (C2) × P (H3|C2) = 1/3 × 1 = 1/3

Answer 6

P (C3) × P (H3|C3) = 1/3 × 0 = 0

Answer 7

Switch doors

Answer 8

A hyperplane that can divide the data

Answer 9

A number assigned to the outcome of an experiment

Answer 10

It dictates the way values of a discrete random variable are distributed.

Answer 11

P(X) states that P(X=1) is p and P(X=0) is (1 - p)

Answer 12

The value expected over a large number of trials

Answer 13

Sum of (each value of X - expected value of X)² * P(X)

Answer 14

The square root of the variance

Answer 15

A bell-shaped curve

Answer 16

68 percent

Answer 17

Variance is the square of the standard deviation

Answer 18

A broader, squatter plot

Answer 19

Expected value

Answer 20

The average outcome of the random variable over many trials

Answer 21

The representative distribution of the data we have

Answer 22

On the order of the square root of the total number of trials

Answer 23

The variance is simply the square of the standard deviation.

Answer 24

A larger standard deviation gives you a broader, squatter plot.

Answer 25

A discrete random variable is characterized by its probability mass function (PMF).

Answer 26

A continuous random variable is characterized by its probability density function (PDF).

Answer 27

No, the probability of a specific, infinitely precise value is actually zero.

Answer 28

It is given by the area under the probability density function (PDF) bounded by the endpoints of that range.

Answer 29

The total area under the entire PDF equals 1.

Answer 30

The probability p.

Answer 31

The mean and variance.

Answer 32

Each instance of data is a d-dimensional vector.

Answer 33

y is -1 if the person did not have a heart attack, and 1 if they did.

Answer 34

It is a classifier that predicts the category with the higher probability based on the underlying distribution.

Answer 35

MLE estimates the best underlying distribution that maximizes the likelihood of observing the data.

Answer 36

MLE maximizes P(D | θ), while MAP maximizes P(θ | D).

Answer 37

Maximum a posteriori estimation.

Answer 38

That θ follows a distribution, meaning it is treated as a random variable.

Answer 39

It refers to the prior belief about the value of θ before observing the data.

Answer 40

A Bernoulli distribution characterized by the value p.

Answer 41

It is characterized by its mean and variance.

Answer 42

Gradient descent.

Answer 43

They begin converging in their estimate of the underlying distribution.

Answer 44

Frederick Mosteller and David Wallace.

Answer 45

The authorship of the disputed Federalist Papers.

Answer 46

Madison and Hamilton did not hurry to enter their claims and became bitter political enemies.

Answer 47

The average lengths for Hamilton and Madison were practically identical, providing little discriminatory power.

Answer 48

Standard deviation (SD).

Answer 49

34.55 and 34.59 respectively

Answer 50

19 for Hamilton and 20 for Madison

Answer 51

The difficulties of applying statistical methods

Answer 52

David Wallace

Answer 53

To revisit the issue of authorship

Answer 54

Function words

Answer 55

By typing each word on a long paper tape

Answer 56

It would malfunction after processing about 3000 words

Answer 57

Bayesian analysis

Answer 58

Overwhelming evidence for Madison's authorship

Answer 59

It was a seminal moment for statisticians and was done objectively

Answer 60

Adélie, Gentoo, and Chinstrap

Answer 61

Five attributes

Answer 62

It may not hold true with more data

Answer 63

The bounds for the best predictions given the data

Answer 64

The distribution of bill depths

Answer 65

Class-conditional probability

Answer 66

To calculate the probabilities for each hypothesis

Answer 67

119/(119+146)

Answer 68

The prior probability that the penguin is a Gentoo, estimated as 119 / (119 + 146) = 0.45.

Answer 69

It is read off from the distribution depicted in the plot, specifically from the Gentoo part.

Answer 70

The probability that the bill has some particular depth, calculated as: * P(x | Adélie) × P(Adélie) * P(x | Gentoo) × P(Gentoo)

Answer 71

The posterior probability that the penguin is a Gentoo, given some bill depth x.

Answer 72

A simple classifier using one feature (bill depth) to classify between two types of penguins, Gentoo and Adélie.

Answer 73

The probability of a hypothesis after considering the evidence.

Answer 74

We often do not have access to the true underlying distribution.

Answer 75

To approximate underlying distributions from a sample of data.

Answer 76

They are indistinguishable using only bill depth.

Answer 77

Bill length.

Answer 78

A function that describes the likelihood of a random variable to take on a particular value.

Answer 79

It increases the complexity and data requirements for accurate estimation.

Answer 80

That all features are sampled independently from their own distributions.

Answer 81

A classifier that assumes mutually independent features to simplify probability calculations.

Answer 82

A function that gives the probability that a discrete random variable is equal to a specific value.

Answer 83

The data D is sampled from the underlying distribution P(X, y).

Answer 84

The parameters that define the distribution, varying for different types.

Answer 85

To find the parameter θ that maximizes the likelihood of the data.

Answer 86

MLE tries to find the θ that maximizes the likelihood of the data, meaning it finds the θ that maximizes P θ (X, y) ## Footnote MLE is a method used in statistics to estimate parameters of a statistical model.

Answer 87

MAP assumes that θ is a random variable and allows for specifying a probability distribution for it ## Footnote MAP incorporates prior beliefs about θ, which is known as the prior.

Answer 88

The prior is the initial assumption about how θ is distributed ## Footnote For example, assuming a coin is fair or biased before observing any data.

Answer 89

MAP finds the posterior probability distribution P θ (X, y) given the prior and the data ## Footnote The posterior represents updated beliefs about θ after observing the data.

Answer 90

It enables generating new data that resemble the training data, leading to generative AI ## Footnote This process involves sampling from the learned distribution.

Answer 91

It is an algorithm that learns the joint probability distribution with simplifying assumptions and uses Bayes's theorem ## Footnote The naïve Bayes classifier is often used for classification tasks.

Answer 92

Discriminative learning focuses on calculating conditional probabilities of the data belonging to one class or another ## Footnote It contrasts with generative learning, which models the entire data distribution.

Answer 93

P θ (y | x) represents the probability of the most likely class for a given feature vector x and optimal θ ## Footnote This is used in discriminative learning to make predictions.

Answer 94

An example is the nearest neighbor (NN) algorithm ## Footnote The NN algorithm does not make assumptions about the underlying distribution of the data.

Answer 95

Discriminative learning identifies a boundary that separates clusters of data points ## Footnote It can be a linear hyperplane or a nonlinear surface.

Answer 96

The NN algorithm achieved results nearly as good as the Bayes optimal classifier without underlying distribution assumptions ## Footnote It was developed at Stanford in the 1960s.

4 - In All Probability Flashcards

(134 cards)