Class Test 2 Flashcards

1
Q

What is the period of a poisson?

A

Any quantity such as time, length as long as the rate is fixed for the experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a poisson modeling?

A

The number of times an event happens in a specified period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main properties of poisson random variables?

A

Probability that an event occures is the same for intervals of the same size.
Intervals do not overlap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define random variable.

A

A random variable is a variable which assumes numerical values representing the outcome of an experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a continuous random variable?

A

A random variable which can assume an infinite number of numerical values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the probability of exactly X for all continuous random variables?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does p.d.f. stand for?

A

probability density function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the conditions for p.d.f.

A

The output of the pdf is always greater than or equal to zero.

When calculating the p.d.f. of a function f(x) the area under the curve is used. Hence integrating to infinity on f(x) should equal 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cumulative distribution function for uniform distribution?

A

0 if x < a
x - a / b - a if a <= x <= b
1 if x > b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cumulative distribution function for exponential distribution.

A

P(X <= x) = 1 - e^(-lambda * x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If we have already waited a units for the event, what is the probability if we wait another b units for an exponential random variable?

A

The same as the probability of waiting b units because of the memoryless property.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the relationship between a poisson and an exponential random variable.

A

The time between consecutive events of a Poisson process follows an exponential distribution with the same rate lambda.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Differentiate between poisson and exponential.

A

Exponential measures wait time between events.

Poisson measures events per period of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Differentiate between exponential and weibull distributions.

A

Weibull is the same but overcomes the memoryless property. So the probability of waiting b after waiting a is no the same as the probability of waiting b.

Allows the failure probability to vary with time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a uniform distribution X~U(a,b)?

A

A distribution from a lower to b higher where all possibilities are equally likely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a weibull distribution X~Weibull(a, β)?

A

a is the shape parameter, is the rate at whih the probability density decreases with respect to X.

β is the scale parameter which determines the size of the values of X for which the distribution is most concentrated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cumulative distribution function for Weibull.

A

P(X <= x) = 1 - e^(- x / β)^a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the affect of each of the following on the graph?
(a) Increase mean
(b) Decrease mean
(c) Increase variance
(d) Decrease variance

A

(a) Shape the same, but location shifts to the right. (left-skewed)
(b) Shape the same, but location shifts to the left. (right-skewed)
(c) Shape flattened, more spread out but location the same
(d) Shape narrowed, more concentrated but location the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a standard normal distribution?

A

A normal distribution with mean 0 and variance 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to calculate? (a)P(X < -a)
(b)P(X > a)
(c)P(X > -a)

A

(a)=P(X > a) = 1 - P(X < a)
(b)=1 - P(X < a)
(c)=P(X < a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Z table formula?

A

Z = X - u / σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

For a very large n, what other distribution approximates the normal distribution?

A

Binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a sample statistic?

A

Sample statistics summarise random variable samples so are also subject to randomness.

T(x)=T(x1, .. xn)

where x is a random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

On what does the distribution of the sample statistic depend?

A

sample size n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a statistical estimator?

A

Is when a sample statistic is used to estimate a parameter θ of the population.

26
Q

What are the properties of T(X)?

A

Unbiasdness: How close the average T is to the true parameter.
Standard Deviation: Precision of T, low variability is good as centered on true value.

27
Q

How are estimators denoted?

A

^ symbol above letter.

28
Q

Central Limit Theorem Formula. Explain.

A

Normal(u, s.d.^2 / n)

Distribution of sample mean approximation, even when we dont know the type of distribution of X.

29
Q

If X is not normal does the CLT still apply?

A

Yes, if n is sufficiently large than X will be approx. normal for any distribution.

30
Q

Expected value of continuous distribution?

A

integral by x

31
Q

Variance of continuous distribution?

A

integral of x^2 - e[x]^2

32
Q

What is a confidence interval?

A

A confidence interval for a population parameter is an interval which almost certainly contains the true parameter value.

33
Q

Formula for confidence interval by central limit theorem. What is a?

A

[X - Z(a/2) * s / root(n) , X + Z(a/2) * s / root(n)]
The decimal value for the amount of data we are allowed to have outside the interval.

34
Q

What is the question which hypothesis testing seeks to answer?

A

Is the relationship observed in the sample clear enough to be called statistically significant, or could it have been due to chance.

35
Q

What is the null hypothesis (H0)?

A

Usually says that nothing changed or happened. The status quo, or what is currently believed.

36
Q

What is the alternative hypothesis (Ha)?

A

The complementary hypothesis. Often is the alternative that challenges the status quo, in an experiment is what the researcher wants to prove.

37
Q

What does a p-value do?

A

Assumes that the null hypothesis is true, how likely would it be to obtain results at least as extreme as we have observed.

38
Q

What are the two outcomes of hypothesis testing?

A

Fail to reject the null hypothesis.
Reject the null hypothesis.

39
Q

Distinguish between a type 1 and type 2 error.

A

Type 1: Rejecting the null hypothesis H0, when it is in fact true is a type 1 error.(False Positive)
Type 2: Accepting the null hypothesis H0 when it is in fact false is a type 2 error.(False Negative)

40
Q

What is a Z-test? Formula.

A

Hypothesis testing on sample mean.

z = x - u / σ / root(n)

41
Q

How do u know if a weibell increases or decreases with time?

A

If a is greater than one it increases with time
If a is less than one it decreases with time

42
Q

When making a confidence interval for 99% what is the decimal Z score used?

A

2.58. (So round up!!!)

43
Q

Explain why you would use a t-test rather than a z-test?

A

The population standard deviation is unknown, so we estimate it with the
standard deviation of the sample. Also valid for smaller samples.

44
Q

If you have the statistics summary generated by python, how do you find (a) the fitted line (b) the correlation coefficent (c) the coefficient of determination (d) a t-test to assess the utility of the linear regression model.

A

Fitted Line: Y = Intercept + x-coefficient/slope * X
Correlation Coefficient: Root(R^2)
Coefficient of Determination: R^2
T-test: Take x-coefficient/slope and find associated p-value. If p-value < a, we have a significant result. We reject the null hypothesis that the slope coefficent is statistically equal to zero, meaning that we have a significant relationship between house size and price.

45
Q

How do you caculate Sxx and Sxy of the data?

A

Sxx = sum of all(xi - X bar)^2

Sxy = sum of all(xi - X bar)(yi - Y bar)

46
Q

How do you calculate B0 intercept and B1 advertising?

A

B0 (intercept) = y bar - B1 * x bar
B1 (advertising) = Sxy / Sxx

47
Q

Which goes on the x-axis in a scatterplot independent variable or dependent variable?

A

Independent Variable

48
Q

Give a formula for correlation coefficent?

A

r = Sxy / root(Sxx * Syy)

49
Q

If we have a scatterplot and histogram of the residuals, what can we say about the model’s assumptions?

A

To Satisfy model’s assumptions:
Histogram:
Centered at 0
Bell shaped form shows is approx. normal
Scatterplot:
Residuals are randomly spread around zero meaning that the
assumption of constant variance is satisfied.

50
Q

What is v? How is it calculated?

A

degrees of freedom. n - 1.

51
Q

Explain the difference between a deterministic model and a probabilistic model.

A

Deterministic: The value of one variable Y is completely determined by the value of another variable X.

Probabilistic: Allows for unexplained variation or random error. Consists of a deterministic component and a random error component.

52
Q

What is the general form of the probabilistic model?

A

Y = deterministic component + Random Error

Y is the dependent variable
X is the explanatory variable (deterministic component)

53
Q

What type of variables are X and Y?

A

X is an independent or predictor variable or explanatory
Y is a dependent or response variable

54
Q

Correlation coefficent explain different values?

A

r near 0, no correlation
r close to 1, strong positive correlation
r close to -1, strong negative correlation

55
Q

What form does the deterministic component of the probabilistic model take?

A

B0 + B1X

56
Q

What assumption is made about the random error component of the probabilitstic model take?

A
  • Error follows normal distribution.
  • Has mean 0.
  • Variance is sigma^2. (Variance either side doesn’t increase or decrease, rectangle shaped)
57
Q

How is the test statistic calculated if I wanted to find p-value for linear regression?

A

t = B1/ root(MSE / Sxx)

58
Q

Give formula for MSE.

A

MSE = SSE / (n-2)

59
Q

Explain how to calculate SSE.

A

Take the difference between the fitted line and the observed value, square it and add it to the result of the same calculation for every point.
I.e. SSE = n Sum i=1 (Observed Point i - Estimated point i)^2

60
Q

What does SSE and MSE stand for?

A

Sum of Squared Errors
Mean Squared Error

61
Q

What is the null and alternative hypothesis for linear regression?

A

H0: B1 = 0; Y does not depend on X
HA: B1 != 0; There is a linear relationship between the two variables

62
Q

What is coefficient of determination? How is it calculated?

A

R^2. correlation coefficent^2 or (Sxy)^2 / Sxx * Syy