week 1 Flashcards

1
Q

Probability

A

: a branch of mathematics concerning the analysis of random phenomena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random phenomena

A

processes with an uncertain outcome.
(e.g., flipping a coin; gambling games)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential statistics and probability are related because…

A

sampling a group of people from the population is a random phenomenon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In probability, we know the true model/mechanism in the population. Based on the true model, we compute…

A

the probability of different outcomes.

e.g., If I flip a fair coin 10 times, how likely is it that I will get 5 heads?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In inferential statistics, we do NOT know…

A

the true model/mechanism in the population. We infer the true model-based on the outcomes from our sample data

e.g., If my friend flips a coin 10 times and gets 10 heads, are they playing a trick on me? In other words, is the coin a fair coin?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In probability, the term experiment is used in a loose sense to mean…

A

a procedure for which the outcome is uncertain.

Examples of experiments include:
§ an experimental study
§ toss of a coin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sample space of the experiment

A

The set of all possible outcomes of an experiment

is denoted by S.

§ Best to think of the sample space as an area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

random event or an event

A

A subset of the sample space

If the experiment consists of flipping two coins, then an event can be getting head on the first coin:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Probability measure

A

function that maps the random events in
the sample space onto the real numbers between 0 to 1.

The function “measures” the area of the event out of the whole sample space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Probability of an event E is denoted as

A

P(E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Frequentist and Bayesian perspectives have different conceptualizations of…

A

the probability measure.

different views on how we should map the events in the sample space onto the real numbers between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

N(E) represents the…

A

number of times in the first N repetitions of the experiment that the event E occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the frequentist perspective, what is the probability of an event?

A

The probability of the event is the proportion of times the event E has occurred as we perform the same experiment infinitely many times (i.e., N reaches infinity).

probability is the frequency of the event
occurrence, hence called the frequentist perspective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the Bayesian perspective, what is the probability of an event?

A

represents a degree of your subjective belief about the occurrence of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

frequentist definition

A

long-run probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

bayesian

A

degree of belief

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Properties of frequentist perspective

A

Objective/Unambiguous

Can’t assign probability to events that are not replicable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Properties of bayesian perspective

A

subjective/ ambiguous

can assign probability to any event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a random variance?

A

A random variable is a function that maps random events in the sample space of an experiment onto the real number line.

Through a random variable, we can use numbers to quantify
or represent the occurrence of an event.
-usually denoted by a capital letter (e.g., X or Y )
- different from the algebraic variable (e.g., a ` 5), which means any unspecified number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

An indicator (or Bernoulli) random variable (X) maps…

A

the occurrence of the event to 1.
the non-occurrence of the event to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to denote a bernoulli random variable:

For example, let X indicate whether we get a head after a coin flip.

A

X(H) = 1
X(T) =0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Discrete random variables

A

can only take on specific values, usually whole numbers

indicator random variable (X “ 0, 1); binomial random variable
(X “ 0, 1, 2, 3 . . .)

countable number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Continuous random variables

A

can take on any value in an
interval

e.g., normal random variable.Can take on any value on the real number line from positive to negative infinity
X = 0.00001

uncountable number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the probability measure of the random variable map?

A

For a random variable, the probability measure maps the values of the random variable onto a value between 0 and 1, which measures the likelihood of the values of the random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

probability distribution.

A

Each random variable has a probability distribution.

Discrete: probability mass function (PMF)
§ Tells us the probability associated with each possible value of the random variable.

§ Continuous: probability density function (PDF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Probability mass function of random variable

A

In the example of X being the indicator random variable representing getting a head after a fair coin flip, the PMF of X is
P(X=0) = 0.5
P(X=1) = 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Bernoulli Distribution

A

If a random variable is a Bernoulli random variable, we can say that the random variable follows the Bernoulli
distribution.

By Bernoulli distribution, we mean the probability distribution associated with the Bernoulli random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If X is a Bernoulli random variable where P(X=1) = p then we can write

A

X ~Ber(p)

where the symbol “~” stands for “follows”, and Ber stands for Bernoulli distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

For brand-named random variables, their distributions are characterized by a small number of parameters.

Explain parameter in this context

A

For X ~Ber(p), p is the parameter that fully describes the Bernoulli distribution.

Parameters are considered non-random, fixed variables.

This usage of the term “parameter” is a bit different but related to the case when “parameter” is used to mean the quantities computed with population data.

30
Q

Normal distribution

A

A normal random variable (a.k.a., Gaussian random variable) is a continuous random variable that follows the famous “bell curve” distribution.

31
Q

The “bell curve” distribution is called the __________________________________________________________ of the normal random variable

A

The “bell curve” distribution is called the Probability Density Distribution (PDF) of the normal random variable

32
Q

The normal random variable is characterized by two parameters:

A
  1. expected value u
  2. variance o^2
33
Q

How to denote X as a normal random variable:

A

X ~ N (u,o^2)

34
Q

Standard Normal Random Variable and how to denote it

A

When the normal random variable has a mean of 0 and a variance of 1, then it is called the standard normal random variable, usually denoted as

Z ~ N (0,1)

35
Q

We can transform any normal random variable to the standard normal variable. Then you can transform X to follow the standard normal distribution by

36
Q

what is the probability of a Continuous Random Variable taking on any specific value ?

A

For a continuous random variable, we cannot talk about the probability of the random variable taking on any specific value. the probability of a continuous random variable taking on a specific value is always zero.

For a continuous random variable, we can only talk about the probability of the random variable taking on a range of possible values.

37
Q

cumulative distribution function (CDF)

A

tells us the probability of a random variable taking on a value that is equal to or less than a cutoff point.

P(X< a) or P(X < a) is the area under the curve below a

38
Q

68–95–99.7 Rule

A

The 68–95–99.7 rule is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution.

39
Q

There are four R functions for the normal distribution:

A

dnorm()
pnorm()
qnorm()
rnorm()

40
Q

dnorm()

A

The dnorm() function computes the PDF of the normal
distribution.

Output the probability density of a normal random variable at a specific value

Not commonly used because for continuous random
variables, the probability of a range of values is more
important (i.e., the area under the PDF)

41
Q

pnorm()

A

The pnorm() function computes the CDF of the normal
distribution

Output the probability of a normal random variable taking on values below the quantile value.

Need to input:
q: the quantile value at which you want to compute the
probability.
mean: value for the parameter µ.
sd: value for the parameter σ.

Other input:
§ lower.tail: logical; whether you want the upper tail or the lower tail probability. By default, lower.tail=T.

42
Q

qnorm()

A

The qnorm() function computes the quantile value given a probability below the quantile value.

Output the quantile value.

Need to input:
p: the probability below the quantile value.
mean: value for the parameter µ.
sd: value for the parameter σ.

Other input:
lower.tail: logical; whether you specified the upper tail or the lower tail probability for p. By default, lower.tail=T.

43
Q

In the R functions what do you need to remember about the variance?

A

Note: Remember to square root the variance to get the
standard deviation for the argument sd.

44
Q

rnorm()

A

generates/simulates random numbers from the
normal distribution

Suppose our population data follow a normal distribution N(100, 400). We want to simulate randomly sampling 10 values from the population. Then we can do
rnorm(n = 10, mean = 100, sd = sqrt(400))

45
Q

Binomial distribution:
Notation?
What kind of random variable?

A

X ~ Bin(N,p)

discrete random variable

46
Q

Chi-square distribution:
Notation?
What kind of random variable?

A

X ~ x^2(df)

continuous random variable

47
Q

t distribution:
Notation?
What kind of random variable?

A

X ~ t(df)

continuous random variable

48
Q

Random variables characteristics

A

Associated with random events.

Have probability distribution

Can take on more than one possible value.

Denote using capital letters XY

49
Q

Constants or Fixed Values

A

Associated with non-random event

Do not have probability distribution

Can only take on one possible value

Denote using small letters ax

50
Q

What does a random variable quantify?

A

a random procedure’s different outcomes.

51
Q

once you see the random procedure’s outcome, it is
called….

A

the realized value of a random variable.

The realized value of a random variable is treated as
constant

52
Q

empirical probability distribution.

A

We can also realize this random variable multiple times and then graph the empirical probability distribution

We can realize the random variable 10 times by flipping a fair coin 10 times.

The empirical probability distribution is an estimation of the theoretical probability distribution.

53
Q

Usually, the population data of a variable are assumed to follow….

A

the normal distribution

54
Q

Sample statistics (e.g., the sample mean) across repeated studies are _____________________ __________

A

random variables

  • has a probability distribution
55
Q

Population parameters (e.g., the population mean) are __________________

A

constants

do not have a probability distribution

56
Q

The sample data are random across repeated sampling;
therefore, sample statistics are also ___________

57
Q

Population parameters are considered _____________________________ in the Frequentist perspective.

A

constants (or fixed values)

58
Q

Do population parameters have any probability distributions associated?

A

No bc they are constants

59
Q

Parameters of a random variable:

A

numerical quantities that fully describe a distribution

u and o^2 in X ~ N (u,o^2)

60
Q

Population parameters:

A

numerical quantities characterizing the population data

61
Q

From the Bayesian perspective, population parameters are considered…

A

random variables because we are uncertain about
their values.

In Bayesian statistics, you can specify a probability
distribution for each parameter.
§ called prior distribution.

62
Q

CLT roughly implies what?

A

that when we add or average a large number of random variables, the sum or the mean of the random variables is a random variable that follows a normal distribution.

CLT implies when you add or average different random events together and use a random variable to quantify it, then the probability measure of the random variable follows the normal distribution

63
Q

CLT formula

A

At a large n, Xbar approximately follows a normal distribution

N(uxbar = u, o2/x = o^2/n)

64
Q

uxbar

A

the mean of the sampling distribution of sample mean xbar

66
Q

oxbar

A

the standard deviation of the sample distribution of the sample mean X; standard error of the mean SEM

67
Q

In essence, the CLT roughly implies

A

that when we add oraverage a large number of random variables each with finite µ and σ2 the sum or the mean of the random variables follows a normal distribution.

This implies when you add different random events
together and map them onto a number line, it follows the normal distribution.

68
Q

One of the most common applications of the CLT is regarding

A

the sampling distribution of the sample mean.

69
Q

What is the sampling distribution of the sample mean

A

The sampling distribution of the sample mean is the
distribution of the sample mean over repeated samples.

§ “Over repeated samples” means “conducting the same experiment
(with a fixed sample size n) infinitely many times.”

§ Related to the frequentist perspective.

70
Q

according to CLT, the sampling distribution of
the sample mean is a ______________ distribution

A

normal distribution.