Exercise insights Flashcards
How do you calculate correlation in libre office calc
Mark the columns you want to correlate. In header click Data -> Statistics -> correlation. Otherwise you can use the correl function.
Why is the correlation coefficient not the slope of an estimated line of equation for the relationship between two variables
Because the correlation coefficient is a measure of the strength of their relationship aka how well you can make a line of the scattered dots not how that line points, that is the job of the regression line.
Is the concept of dependent and independent variables related to correlation or regression
Regression
Is correlation a measure of how large the covariance is between two variables
Yes
How do you get the estimated slope coefficient of two variables in libre office calc
From the header you click Data -> Statistics -> Regression, you then mark the dependent and independent variable and where you want the result wereafter you get the slope coefficient under Coefficients next to X1 if you only have two variables.
What does it mean when we say that the estimated slope coefficient is statistically significant.
That there is strong evidence that the relationship between the independent variable and the dependent variable in a regression model is not due to random chance. A relationship is statistically significance if the p-value is low and there is no third factor that influences both
Can a regression model as a whole be statistically insignificant
Yes if the independent variables do not collectively explain a significant portion of the variation in the dependent variable. This can be due to a small sample size, noise in the data, a strong correlation between the independent variables or simply a weak relationship between the independent variables and the dependent ones.
What happens when the slope coefficient is equal to zero
It means there is no relationship between the variables detectable with a regression model.
If the p-value is high and the t-value is low is the relationship statistically significant
No if the t-value is high it indicates a strong relationship between the variables and when the p-value is low it means that the chance of observing such a t-value under the null hypothesis is very small. So if the t-value is high and the p-value is small it means that the null hypothesis of b being 0 is probably false and the model is statistically significant.
How do you insert a trend line and function in a scatter plot in libre office calc
You create a scatter plot by marking the related columns and clicking the chart icon were-after you choose scatter-plot. Then after you have made your scatter-plot you can right click on the dots you want a trend-line of and choose trend line and function in the drop-down menu.
The residual is the diference between the average dependent variable and the declared dependent varaible
No, thats the variance of that instance. The residual is the diference between the prediction of the estimator and the actual value at that point.
In a quadratic regression model the deriviative can be interpreted as the increase when x changes by one unit
True
In a linear probability model one can count on the model being restricted between 0 and 1
False, lines continue to infinity
In which regression models does B1/100 represent a UNIT change in y at a 1% change in x. Linear, log-log, exponential or logarithmic
Only logarithmic
In the log log model the coeficient is…
The expected percentage change in y when x changes by 1%
If the correlation is -0.87 and one variable increases by 1 the expected change in the other variable is -0.87
False becouse correlation does mesure their relationship but the strength of their covariance. The only thing we get is that the relationship is a strong negative one.
If the given regression model is y = b1x + b2x + b3x + b0 + e how do you make a restricted function for a restriction test
y = b1(x1+x2) +b3x + b0 + e
If you answer that one is 2.5 percent likelyer for each year in education to be employed in a linear regression question why are you automatically wrong
Becouse percentage changes implies a logarithmic relationship. Name percentage points instead.
How do you write a hypothesis test of joint significance
H0: B1 = B2… Bn = 0
HA: any B value is not 0
How does the logit model look, what if the beta is 0.5
the logit model looks like an s and if the beta is 0.5 that is what multiplies the exponentiation both in the top and the bottom
What assumptions of the OLS estimator is required for it to be unbiased
It needs to be linear in the parameters, it needs to fufill E(u|x) = 0 and it needs to lack perfect collinearity.
What assumptions of the OLS estimator is required for the standard error and thus T and F tests to be valid.
That the samples shuld be independent from an identical distribution i.i.d and that there should be no large outliers.
What assumptions is required for the OLS estimator to be BLUE the best linear unbiased estimator
it needs the errors to be homoskedastic for the Gaus Markov theorem to hold true and it needs the errors to be normally distributed
Is the law of large numbers about consistency
Yes, and consistency means that the probability of the expected value of the estimator being that of the population approaches 1 as the sample size increases
The central limit theorem is about consistency
No its about the distribution aproaching normal as the sample size increases.
i.i.d leads to consistency
True
What makes an estimator good
It is unbiased E(sample) = E(population, it is consitents E(E(sample) = E(population)) -n-> 1, an dit is efficient. var = small
What may cause E(u|X) = 0 no endogineity to not hold
If there is correlation, or non linear relation between x and the residual or if there is a third factor that influences them both
If you want your estimator to be efficient should you pick a sample with a larger or smaller variance
A larger variance becouse the variance of b decreases with the variance of X. It also decreases with the variance of n
The requirement for an omitted variable to cause bias is
Correlation with model variables and an effect on the response variable
What is an idea randomized controlled experiment
Ideal as subjects follow protocol perfectly. Randomized as i.i.d sampling. Controlled as having a control group for an effect without a cause. And lastly an experiment where people are not vaulentearing which would cause selction bias and reverse causality.
What will large but imperfect multicolinearity lead to
Large standard errors for one or more of the regressors
How can you make the OLS estimator be as good as possible. Make it artificially BLUE
You can use robust stadard errors for the gauss markov theorem to hold by nullifying the advers effect of heteroskedacity and you can rely on the central limit theorem to make the errors normally distributed when n is large.
are the coefficients in a multiple regression model generally independently distributed
No they are not as they often are somewhat correlated their effect on y becomes muddled.
F test assumes homoskedasity
True
How do you make a hypothesis test to check if a relation is polynomial of a higher order than one.
You make an F test on the higher order variables.
Can adding a non-linear regressor introduce multicolinearity
Yes although it is often not perfect as the relation is not linear although an F-test might be needed.
What is probit
Using the cumulative normal distibution function to check the likelyhood of Y being zero dependent on x. Pr(Y=1|x) = C(b0+b1x). b0+b1x is the z-value
logit gives similar results to probit
True
Is R² menaingful in probability models
No