Class Test 2 Flashcards
What is the period of a poisson?
Any quantity such as time, length as long as the rate is fixed for the experiment.
What is a poisson modeling?
The number of times an event happens in a specified period.
What are the two main properties of poisson random variables?
Probability that an event occures is the same for intervals of the same size.
Intervals do not overlap.
Define random variable.
A random variable is a variable which assumes numerical values representing the outcome of an experiment.
What is a continuous random variable?
A random variable which can assume an infinite number of numerical values.
What is the probability of exactly X for all continuous random variables?
0
What does p.d.f. stand for?
probability density function
Define the conditions for p.d.f.
The output of the pdf is always greater than or equal to zero.
When calculating the p.d.f. of a function f(x) the area under the curve is used. Hence integrating to infinity on f(x) should equal 1.
Cumulative distribution function for uniform distribution?
0 if x < a
x - a / b - a if a <= x <= b
1 if x > b
Cumulative distribution function for exponential distribution.
P(X <= x) = 1 - e^(-lambda * x)
If we have already waited a units for the event, what is the probability if we wait another b units for an exponential random variable?
The same as the probability of waiting b units because of the memoryless property.
Explain the relationship between a poisson and an exponential random variable.
The time between consecutive events of a Poisson process follows an exponential distribution with the same rate lambda.
Differentiate between poisson and exponential.
Exponential measures wait time between events.
Poisson measures events per period of time.
Differentiate between exponential and weibull distributions.
Weibull is the same but overcomes the memoryless property. So the probability of waiting b after waiting a is no the same as the probability of waiting b.
Allows the failure probability to vary with time.
What is a uniform distribution X~U(a,b)?
A distribution from a lower to b higher where all possibilities are equally likely.
What is a weibull distribution X~Weibull(a, β)?
a is the shape parameter, is the rate at whih the probability density decreases with respect to X.
β is the scale parameter which determines the size of the values of X for which the distribution is most concentrated.
Cumulative distribution function for Weibull.
P(X <= x) = 1 - e^(- x / β)^a
What is the affect of each of the following on the graph?
(a) Increase mean
(b) Decrease mean
(c) Increase variance
(d) Decrease variance
(a) Shape the same, but location shifts to the right. (left-skewed)
(b) Shape the same, but location shifts to the left. (right-skewed)
(c) Shape flattened, more spread out but location the same
(d) Shape narrowed, more concentrated but location the same.
What is a standard normal distribution?
A normal distribution with mean 0 and variance 1.
How to calculate? (a)P(X < -a)
(b)P(X > a)
(c)P(X > -a)
(a)=P(X > a) = 1 - P(X < a)
(b)=1 - P(X < a)
(c)=P(X < a)
Z table formula?
Z = X - u / σ
For a very large n, what other distribution approximates the normal distribution?
Binomial
What is a sample statistic?
Sample statistics summarise random variable samples so are also subject to randomness.
T(x)=T(x1, .. xn)
where x is a random sample
On what does the distribution of the sample statistic depend?
sample size n
What is a statistical estimator?
Is when a sample statistic is used to estimate a parameter θ of the population.
What are the properties of T(X)?
Unbiasdness: How close the average T is to the true parameter.
Standard Deviation: Precision of T, low variability is good as centered on true value.
How are estimators denoted?
^ symbol above letter.
Central Limit Theorem Formula. Explain.
Normal(u, s.d.^2 / n)
Distribution of sample mean approximation, even when we dont know the type of distribution of X.
If X is not normal does the CLT still apply?
Yes, if n is sufficiently large than X will be approx. normal for any distribution.
Expected value of continuous distribution?
integral by x
Variance of continuous distribution?
integral of x^2 - e[x]^2
What is a confidence interval?
A confidence interval for a population parameter is an interval which almost certainly contains the true parameter value.
Formula for confidence interval by central limit theorem. What is a?
[X - Z(a/2) * s / root(n) , X + Z(a/2) * s / root(n)]
The decimal value for the amount of data we are allowed to have outside the interval.
What is the question which hypothesis testing seeks to answer?
Is the relationship observed in the sample clear enough to be called statistically significant, or could it have been due to chance.
What is the null hypothesis (H0)?
Usually says that nothing changed or happened. The status quo, or what is currently believed.
What is the alternative hypothesis (Ha)?
The complementary hypothesis. Often is the alternative that challenges the status quo, in an experiment is what the researcher wants to prove.
What does a p-value do?
Assumes that the null hypothesis is true, how likely would it be to obtain results at least as extreme as we have observed.
What are the two outcomes of hypothesis testing?
Fail to reject the null hypothesis.
Reject the null hypothesis.
Distinguish between a type 1 and type 2 error.
Type 1: Rejecting the null hypothesis H0, when it is in fact true is a type 1 error.(False Positive)
Type 2: Accepting the null hypothesis H0 when it is in fact false is a type 2 error.(False Negative)
What is a Z-test? Formula.
Hypothesis testing on sample mean.
z = x - u / σ / root(n)
How do u know if a weibell increases or decreases with time?
If a is greater than one it increases with time
If a is less than one it decreases with time
When making a confidence interval for 99% what is the decimal Z score used?
2.58. (So round up!!!)
Explain why you would use a t-test rather than a z-test?
The population standard deviation is unknown, so we estimate it with the
standard deviation of the sample. Also valid for smaller samples.
If you have the statistics summary generated by python, how do you find (a) the fitted line (b) the correlation coefficent (c) the coefficient of determination (d) a t-test to assess the utility of the linear regression model.
Fitted Line: Y = Intercept + x-coefficient/slope * X
Correlation Coefficient: Root(R^2)
Coefficient of Determination: R^2
T-test: Take x-coefficient/slope and find associated p-value. If p-value < a, we have a significant result. We reject the null hypothesis that the slope coefficent is statistically equal to zero, meaning that we have a significant relationship between house size and price.
How do you caculate Sxx and Sxy of the data?
Sxx = sum of all(xi - X bar)^2
Sxy = sum of all(xi - X bar)(yi - Y bar)
How do you calculate B0 intercept and B1 advertising?
B0 (intercept) = y bar - B1 * x bar
B1 (advertising) = Sxy / Sxx
Which goes on the x-axis in a scatterplot independent variable or dependent variable?
Independent Variable
Give a formula for correlation coefficent?
r = Sxy / root(Sxx * Syy)
If we have a scatterplot and histogram of the residuals, what can we say about the model’s assumptions?
To Satisfy model’s assumptions:
Histogram:
Centered at 0
Bell shaped form shows is approx. normal
Scatterplot:
Residuals are randomly spread around zero meaning that the
assumption of constant variance is satisfied.
What is v? How is it calculated?
degrees of freedom. n - 1.
Explain the difference between a deterministic model and a probabilistic model.
Deterministic: The value of one variable Y is completely determined by the value of another variable X.
Probabilistic: Allows for unexplained variation or random error. Consists of a deterministic component and a random error component.
What is the general form of the probabilistic model?
Y = deterministic component + Random Error
Y is the dependent variable
X is the explanatory variable (deterministic component)
What type of variables are X and Y?
X is an independent or predictor variable or explanatory
Y is a dependent or response variable
Correlation coefficent explain different values?
r near 0, no correlation
r close to 1, strong positive correlation
r close to -1, strong negative correlation
What form does the deterministic component of the probabilistic model take?
B0 + B1X
What assumption is made about the random error component of the probabilitstic model take?
- Error follows normal distribution.
- Has mean 0.
- Variance is sigma^2. (Variance either side doesn’t increase or decrease, rectangle shaped)
How is the test statistic calculated if I wanted to find p-value for linear regression?
t = B1/ root(MSE / Sxx)
Give formula for MSE.
MSE = SSE / (n-2)
Explain how to calculate SSE.
Take the difference between the fitted line and the observed value, square it and add it to the result of the same calculation for every point.
I.e. SSE = n Sum i=1 (Observed Point i - Estimated point i)^2
What does SSE and MSE stand for?
Sum of Squared Errors
Mean Squared Error
What is the null and alternative hypothesis for linear regression?
H0: B1 = 0; Y does not depend on X
HA: B1 != 0; There is a linear relationship between the two variables
What is coefficient of determination? How is it calculated?
R^2. correlation coefficent^2 or (Sxy)^2 / Sxx * Syy