Chapter 3: Fundamentals of Statistics Flashcards

1
Q

What is a random variable?

A

This is one that takes on numerical values and has an outcome that is determined by an experiment.

In other words, a random variable is defined as a variable that takes an obsered random (and not deterministic) value.

For this chapter we denote random variables by uppercase letters (usually W, X, Y, and Z), whereas outcomes of random variables are denoted by the corresponding lowercase letters (w, x, y, and z).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an example of a random variable?

A

▶ Outcome of a fair coin.
▶ Outcome of a fair die

We are interested in the number of Tail when tossing a fair coin twice.
▶ The sample space of the 2 coin toss outcomes are:
▶ Ω = {HH, TH, HT , TT}
▶ We define X to be the number Head among the two tosses. Then X can take on 3 possible values:
X = 0 if HH
X = 1 if TH ∪ HT
X = 2 if TT

▶ The X’s sample space is: ΩX = {0, 1, 2}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a bernoulli random variable?

A

This is a random variable that can only take on the balues zero and one.

Bernoulli random variable is sometimes labelled as binary random variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a discrete random variable?

A

This is one that takes only a finite number of values.

A bernoulli random variable is the simplest example of a discrete random variable.

Other examples:
▶ number of tails when tossing a coin twice,
▶ number of students registering for a class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an exampe of a bernoulli random variable?

A

The coin flipping example.

If one coin is ‘fair’ , then P(X = 1) = 1/2 (read as ‘the probability of X equals one is one-half).

Because probabilities must sum to one P(X = 0) = 1/2 also.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats symbol usually represents an unknown probability?

A

θ (theta)

e.g. the probability of any particular customer showing up can be any number between zero and one:
P(X = 1) = θ
P(X = 0) = 1 - θ

If θ = 0.75 then there is a 75% chance that a customer shows up afer making a reservation and a 25% chance that the customer does not show up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is any discrete random variable usually depicted?

A

It’s usually depicted/described by listing its possible value and the associated probability that it takes on each value.

If X takaes on the k possible values {x1, …, xk}, then the probabilities p1, p2, …, pk are defined by:

pj = P(X = xj), j = 1, 2, …, k
where each pj is between 0 and 1, and
p1 + p2 + … + pk = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are probability density functions (pdf)?

A

The pdf of X summarises the information concerning the possible outcome of X and the corresponding probabilities:
f(xj) = pj, j = 1, 2, …, k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Suppose that X is the number of free throws made by a basketball player out of two attempts, so that X can take on the threes {0, 1, 2}.

Assume the pdf of X is given by:
f(0) = 0.2
f(1) = 0.44
f(2) = 0.36

What is the probability that the player makes at least one free throw?

A

At least one free throw = P(X ≥ 1)

P(X ≥ 1) = P(X = 1) + P(X = 2)
P(X ≥ 1) = 0.44 + 0.36
P(X ≥ 1) = 0.80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you draw a probability density function (pdf)?

A

f(x) = y axis
x = x axis

probabilies are straight vertical lines starting from 0 on the x axis and going to their repective limit on the y axis (shown on page 48 of textbook).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the types of random variables?

A

A discrete random variable takes on a finite number of values. For example:
▶ number of tail when tossing a coin twice,
▶ number of students registering for a class.

A continuous random variable takes on any value in a real interval. For example:
▶ Time to complete an assignment.
▶ Wages
▶ Return on stock market

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a continuous random variable?

A

A variable X is a continuous random variable if it takes on any real value with zero probability.

The idea is that a continuous random variable can take on so many possible values that we cannot count them or match them up with the positive intergers.

For example:
▶ Time to complete an assignment.
▶ Wages
▶ Return on stock market

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are pdfs used for continuous random variables?

A

We use the pdf of a continuous random variable only to compute events involving a range of values, because it makes no sense to discuss probabiity that a continuous random variable takes on a particular value.

e.g. if a and b are constants where a <:b, the probability that X lies between the numbers a and b, (a ≤ X ≤ b), is the area under the pdf between points a and b.

To find this value you find the integral of the function f between points a and b.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you draw a pdf for continuous random variables?

A

f(x) = y axis)
x = x axis

area underneath the function (non linear line) represents the total probability, meaning the entire area under the pdf must always equal one (example on page 49 of textbook).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are cumulative distribution functions (cdf) used for?

A

When computing probabilities for continuous random variables, it is easiest to work with the cdf.

If X is any random variable, then its cdf is defined for any real number x by:
F(x) = P(X ≤ x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are two important properties of cdfs that are useful for computing probabilities?

A

For any numbers c, P(X > c) = 1 - F(c)

For any numbers a < b, P(a < X ≤ b) = F(b) - F(a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some useful (univerate) distributions?

A

Some discrete distributions that are useful in modeling social phenomena include the:
▶ Bernoulli, for binary outcomes (e.g. pass/fail in a test)
▶ Binomial, for independent repetitions of Bernoulli “trials” (e.g., number of successes in throwing a basketball)
▶ Poisson, for count variables (e.g. number of clicks on a site in a minute)

Some continuous distributions that are useful include the:
▶ Normal, for measurement errors
▶ Exponential, for waiting time for first occurrence (e.g., of first patient)
▶ Student t, Chi-square χ2, F distribution for testing hypothese

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a joint distribution?

A

This analyses events with more than one random variable.

▶ We have these because we are usually interested in phenomena that involve more than one random variable:
▶ e.g wage and gender, temperature and covid infection, etc.
▶ Therefore we study joint distributions.
▶ The joint distribution is described by the joint CDF, or the joint PDF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is an example of joint distributions?

A

For example, the joint distribution of two discrete RVs:
▶ X : # of women among two customers
▶ Y : number of items bought

Is given by: (TABLE)
fXY X
_____________________________
| 0 1 2
{ 0 | 0.05 0.1 0.03
y{ 1 | 0.21 0.11 0.19
{ 2 | 0.08 0.15 0.08

▶ Each cell is the joint probability Pr(X = x ∩ Y = y ).
▶ For example, Pr(X = 0 ∩ Y = 0) = .05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are marginal distributions?

A

This gives the probabilities of various values of the variables in a subset without reference to the values of the other variables.

The marginal distributions of each RV can be obtained from the joint distribution:
fY (y ) = Pr (Y = y ) = ∑x Pr (Y = y ∩ X = x)
fX (x) = Pr (X = x) = ∑ y Pr (X = x ∩ Y = y )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Find the probability of Y = 0 when the marginal PDF of Y , is obtained by
computing the probabilities:
▶ Pr(Y = 0)
▶ Pr(Y = 1)
▶ Pr(Y = 2)

A

Pr (Y = 0) = ∑x=0,1,2 Pr (Y = 0 ∩ X = x)

(according to table)
When y = 0, X1= 0.05, X2 = 0.1 and X3 = 0.03.

Pr (Y = 0) = Pr ({Y = 0 ∩ X = 0}) + Pr ({Y = 0 ∩ X = 1})+ Pr ({Y = 0 ∩ X = 2})

= 0.05 + 0.1 + 0.03 = 0.18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When are random variables independent?

A

▶ Two RVs are independent when knowing the value of one does not change the distribution (the probabilities) of the other.

▶ Formally, two RVs are independent if and only if the joint distribution is the product of the marginal distributions:
fXY (x, y = fX (x) * fY (y )

▶ Independence is symmetric: If Y is independent of X then X is independent of Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you check the independence of random variables?

A

▶ To check independence, we need to check for all pairs (x, y ) if Pr (Y = y ∩ X = x) = Pr (Y = y) · Pr (X = x) .
▶ In our example independence clearly does not hold, since

e.g.
Pr (Y = 0 ∩ X = 1) = ̸ = Pr (Y = 0) · Pr (X = 1)
0.10 = ̸ = 0.18 ∗ 0.36

(according to table)
Pr (Y = 0 ∩ X = 1) = 0.1 (when y = 0 and x = 1)
Pr (Y = 0) = 0.18
Pr (X = 1) = 0.36

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are conditional distributions?

A

A conditional distribution is a distribution of values for one variable that exists when you specify the values of other variables. This type of distribution allows you to assess the dispersal of your variable of interest under specific conditions, hence the name.

This information is summaries by the conditional probability density function, defined by:
fXY (y|x) = fX,Y (x,y)/fx(x)
for all values of x such that fx(x) > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How are conditional distributions depicted with discrete random variables?

A

The condititonal probability density function [fXY (y|x) = fX,Y (x,y)/fx(x)] is most easily seen when X and Y are discrete, then:

fY|X (y|x) = P(Y = y|X = x)

The right-hand side is read as ‘the probability that Y = y given that X = x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How are conditional distributions depicted with continuous random variables?

A

When Y is continuous, fX|y (y|x) is not interpretable directly as a probability, for the reasons discussed earlier, but condititional probabilities are found by computing areas under the conditional pdf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the few aspects of distributions of random variables that we will focus on?

A
  • Measures of central tendency (the Expected Value, the median);
  • Measures of variability or spread (variance and standard deviation), and
  • Measures of association between two random variables (coveriance and correlation).
28
Q

What is the expected value (EV)?

A

If X is a random variable, the expected value (or expectation) of X (denoted E(X) and sometimes μx or simply μ is a weighted average of all possible values of X.

AKA - the mean.

▶ Expected value or Expectation of a function of a RV:
E(X) = ∑ all x x · Pr(X = x) for discrete RV
E(X) = ∫ x · f(x)dx for continuous RV

29
Q

What determines the weighted average of all possible variables of X?

A

The Probability Density Function.

30
Q

If X is a continuous random variable, then what is the expected variable defined as?

A

E(X) is defined as an integral:
E(X) = ∫ x · f(x)dx

Note: ∫ has infinity symbol above and - infinity symbol below it.

31
Q

Example of EV on discrete random variables:
▶ A fair die is tossed:
▶ You win $2 if the result is 1
▶ You win $1 if the results is a 6
▶ but otherwise you lose $1

What is the expected value/expetation from playing this game?

A

X = $2 $1 -$1
p = 1/6 1/6 4/6

E(X) = $2 1/6 + $1 1/6 − $1 4/6 = −$0.17

▶ On average you will lose 17 cents per play by playing this game.

32
Q

What are the rules for calculating EV?

A

When calculating expectations we should be aware of the following rules:
1. E(a) = a for any real constant a
2. E(a · X ) = a · E(X )
3. E(a + b · X ) = a + b · E(X )
4. E(r (X ) + h(X )) = E(r (X )) + E(h(X )
5. E(a · X + b) = a(X ) + b(Y )

▶ Note that in general
E (r (X)) ̸ = r (E(X))
For example
E (log(X)) ̸ = log(E(X))

33
Q

What is the median?

A

This is another measure of central tendency.

34
Q

What is the median?

A

This is another measure of central tendency.

If X is continuous, then the median of X, say m, is the value such that one-half of the are under the pdf is to the left of m and one-half is to the right of m

35
Q

When X is discrete and takes on a finite number of odd values, how is the median obtained?

A

When X is discrete and takes on a finite number of odd values, the median is obtained by ordering the possible values of X and then selecting the value in the middle.

For example, if X can take on the values {-4, 0, 2, 8, 10, 13, 17}, then the median value of X is 8.

36
Q

Is median (Med(X)) or mean (E(X)) better?

A

They are different, but neither are better than the other as a measure of central tendency.

They are both valid ways to measre the centre of the distribution of X.

In one special case, the median and EV are the same (if X has a symmetric distribution about the value μ, then μ is both the EV and the median.

37
Q

What is variance?

A

▶ The variance is a measure of the dispersion of the RV around its mean.
▶ It is defined as: σ2 = V (X) = E((X − E(X))↑2)
▶ The variance measure the expected distance of X to its mean.
▶ The variance is a positive real number, measured in the square of the units in which X is measured.
▶ We often find more useful its square root, the standard deviation

σ = √ σ↑2

38
Q

Why is variance important to know?

A

Because it is a measure of variability, it is used because measures of central tendency do not tell us everything we want to know about the distribution of a random variable.

39
Q

What are the two important properties of variance?

A
  1. Var(X) = 0 if, and only if, there is a constant c such that P(X = c) = 1, in which case E(X) = c.

This first property says that the variance of any constant is zero and if a random variable has zero variance, then it is essentially constant.

  1. For any constants a and b, var(aX + b) = a↑2var(X)

This means that adding a constant to a random variable does not change the variance, but multiplying a random variable by a constant increases the variance by a factor equal to the square of that constant.

40
Q

What is standard deviation?

A

This is denoted as sd(X) and is simply the positive square root of the variance: sd(X) = + √var(X)

The standard deviation is sometimes denoted σx or σ.

41
Q

What are the properties of standard deviation?

A
  1. For any constant c, sd(c) = 0
  2. For any constants a and b, sd(aX + b) = |a|sd(X)
42
Q

How do you standardise a random variable?

A

We define a new random variable (x) by subtracting off its mean (μ) and dividing by its standard deviation (σ):

Z = X - μ /σ

This can be written as: Z = aX + b, where a = (1/σ) and b = -(μ/σ)

Therefore:
- E(Z) = aE(X) + b = (μ/σ) - (μ/σ) = 0
- Var(Z) = a↑2 var(X) = (σ↑2/σ↑2) = 1

43
Q

What is covariance and correlation?

A

They are measures of the linear relationship between two variables - AKA Measures of Association.

While the join pdf of two random variables completely describes the relationship between them, it is useful to have summary measures of how, on average, two random variables vary with one another.

44
Q

What is covariance?

A

▶ Covariance measures the degree of linear association between the variables
▶ Cov (X , Y) = E((X − E(X)) · (Y − E(Y)))

45
Q

What are the rules for calculating covariance?

A
  1. Cov(a · X , b · Y) = a · b · Cov(X , Y) for any constants a, b.
  2. Cov(X1 + X2, Y) = Cov(X1, Y) + Cov(X2,Y)
  3. Cov(X, X) = Var(X)
  4. Cov (X , Y ) = E(X · Y ) − E(X ) · E(Y)
46
Q

What property shows how covariance is related to the notion of independence?

A

Property Cov. 1:
If X and Y are independent, then cov(x,y) = 0

It’s important to remember that the converse of cov.1 is not true: this means that zero covariance between X and Y does not imply that X and Y are independent.

47
Q

What property shows covariances between linear functions?

A

Property Cov. 2:
For any constants a1, b1, a2 and b2,
Cov(a1X + b1, a2Y + b2) = a1a2cov(X, Y)

48
Q

What property shows the absolute value of the covariance between any two random variables is bounded by the product of their standard deviations?

A

Property Cov. 3:
|cov(X,Y)| ≤ sd(X)sd(Y)

This is known as the Cauchy-Schwartz inequality

49
Q

Why do we use the correlation coefficient?

A

How we measure variables may have no bearing on how strongly they are related, BUT the covariance between them does depend on the units of measurement.

E.g. covariance between education and earnings depends on whether earnings are measured in dollars or thouasands of dollars, or whether education is measured in months or years.

SUMMARY: the magnitude of Cov (X , Y) depends on the units
of X and Y , this is why we often use the correlation coefficient.

50
Q

What is the correlation coefficient?

A

The fact that covariance depends on units of measurement is a deficiency that is overcome by the correlation coefficient between X and Y:

ρXY = Corr (X , Y ) = [Cov (X , Y)]/[sd(X) * sd(Y)]
= [σXY]/[σX * σY]

which is a unit-free measure of their linear association.

51
Q

What is the correlation coefficient sometimes denoted as?

A

The correlation coefficient between X and Y is sometimes denoted ρXY (and is sometimes called the population correlation).

52
Q

What are the properties of correlation?

A

It can be shown that ρXY ∈ [−1, 1]. In particular:
▶ ρXY = 0 : No correlation (but this does not imply independence)
▶ ρXY = 1 : Perfect positive correlation
- We can write Y = a + b · X with b > 0
▶ ρXY = −1 : Perfect negative correlation
- We can write Y = a + b · X with b < 0

▶ Two independent RVs are always uncorrelated. The
converse is NOT ALWAYS TRUE

53
Q

What is the 3rd important property of variance?

A

For constants a and b,

Var(aX + bY) = a↑2var(X) + B↑2var(Y) + 2abcov(X, Y)

It follows immediately that, if X and Y are uncorrelated (so that cov(X,Y) = 0) then:

Var(X + Y) = Var(X) + Var(Y)
And
Var(X - Y) = Var(X) + Var(Y)

54
Q

What is the normal distribution?

A

A Normal Distribution is a continuous random variable
that can take any value

▶ Its probability density function is a bell shape.

55
Q

How is a normal distribution written out?

A

X ∼ N(μ, σ ↑ 2)

  • N = normal
  • μ = E(X )
  • σ2 = Var (X )

The pdf is mathematically rotten as:
f (x) = 1/[σ√2π] * exp[−(x−μ)↑2/2 σ↑2], −∞ < x < ∞

56
Q

Why is the mean the same as the median for a normal distribution?

A

Because a normal distribution is symmetrical.

57
Q

How do you standardise a random variable distribution?

A

Z = (X - μ)/σ

E(Z) = 0
V(Z) = 1

E.G. If X ∼N(10,4)
Then Z = (X - 1)/2 ∼ N(0,1)

58
Q

How do you calculate the following?

Suppose X ∼ N(8, 4)
What is P(4 < X < 12) = ?

A

First we normalise X:
Z = (X - 8)/2 ∼ N(0, 1)

P(4 < X < 12) = P((4-8)/2) < (X-8)/2 < (12 -8)/2)
= P (-2 < Z < 2)

59
Q

What are some other useful distributions?

A
  • Chi-square distribution
  • T distribution
  • F distribution
60
Q

What is the chi-square distribution?

A

X = ∑n i=1 Zi↑2

It is used to describe the distribution of a sum of squared random variables.

It is also used to test the goodness of of fit of a distribution of data, whether data series are independent, and for estimating confidences surrounding variance and sd for a random variable form a normal distribution.

61
Q

What is a T distribution?

A

A T-distribution is similar to the standard normal distribution, but with a more pointy tip and fatter tail.

T = Z/(√X/N)

62
Q

What is the F Distribution?

A

This is used for hypotheses testing in the context of multiple regression analysis.

The graph has a peak that leans towards the f(x) axis.

F = (X1 / k1) / (X2 / k2)

63
Q

What are conditional distributions?

A

Pr(Y = y|X = x) === Probability of Y=y given X=x

Probability that 2 items are brought given that the customers are women: Pr(Y = 2|X = 2)

We fixed the value of X.

The condition PDF of one variable, given (a value of) the other variable is:
FY|X=x (y) = fxy(x, y)/fx(x)
FX|Y=y (x) = fxy(x, y)/fy(y)

64
Q

How do you compute joint and marginal distributions in a table?

A

You add up each row/column and get the totals. Those totals are Fy and Fx.

Example

Fxy X Fy
Y 0 1 2
0 0.05 0.1 0.03 | 0.18
1 0.21 0.11 0.19 | 0.51
2 0.08 0.15 0.08 | 0.31
___________________________
Fx 0.34 0.36 0.30 | 1.00

65
Q

How do you compute conditional distributions in a table?

A

You get the conditional distribution of Y given X.

JOINT and MARGINAL Distributions

Fxy X Fy
Y 0 1 2
0 0.05 0.1 0.03 | 0.18
1 0.21 0.11 0.19 | 0.51
2 0.08 0.15 0.08 | 0.31
___________________________
Fx 0.34 0.36 0.30 | 1.00

CONDITIONAL Distribution of Y given X:

FY|X X
Y 0 1 2
0 .05/.34=.147 .1/.36=.278 .03/.30=0.10
1 .21/.34=.618 .305 .633
2 .08/.34=.235 .417 .267
____________________________________________
Total 1 1 1

66
Q

What is the conditional expectation?

A

The mean of the conditional distribution of Y given that another variable X takes on a value x is the conditional expectation (or conditional mean) of Y given X = x and is denoted by:
E(Y|X = x)

67
Q

What is the conditional expectation function (CEF)?

A

E(Y|X = x) generally changes as we change x. That is, as we allow X to vary, we get a function of X, known as their conditional expectation function (CEF), denoted as E(Y|X).