Terms Flashcards

Question

Endogenous Variables

Answer 1

have values that are determined by other variables in the system. A variable is said to be endogenous within the causal model M if its value is determined or influenced by one or more of the IV

Answer 2

Hausman test To decide between fixed or random effects you can run a Hausman test where the null hypothesis is that the preferred model is random effects vs. the alternative the fixed effects (see Green, 2008, chapter 9). It basically tests whether the unique errors (ui) are correlated with the regressors, the null hypothesis is they are not. Run a fixed effects model and save the estimates, then run a random model and save the estimates, then perform the test. If the p-value is significant (for example <0.05) then use fixed effects, if not use random effects. ``` 16 > phtest(fixed, random) Hausman Test data: y ~ x1 chisq = 3.674, df = 1, p-value = 0.05527 alternative hypothesis: one model is inconsistent ```

Answer 3

It will cause ordinal least squares estimators to fail. One of the key assumptions of OLS is that there is no correlation between a predictor variable and the error term

Answer 4

An interaction occurs when an independent variable has a different effect on the outcome depending on the values of another independent variable.

Answer 5

(1) Pooled regression model (2) Fixed effect model and (3) Random effect model

Answer 6

In panel data where longitudinal observations exist for the same subject, fixed effects represent the subject-specific means. In panel data analysis the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model including those fixed effects (one time-invariant intercept for each subject)

Answer 7

If you think there are no omitted variables – or if you believe that the omitted variables are uncorrelated with the explanatory variables that are in the model – then a random effects model is probably best. It will produce unbiased estimates of the coefficients, use all the data available, and produce the smallest standard errors. More likely, however, is that omitted variables will produce at least some bias in the estimates

Answer 8

The residuals are the deviations of the estimated values through our model to the observed values.

Answer 9

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed (only uncorrelated with mean zero and homoscedastic with finite variance).

Answer 10

Independent and identically distributed A collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. the outcomes we get from the flipping of a coin are independent and identically distributed. Independent because one outcome does not depend on the other outcome and identical because every sample comes from the same distribution (there is no change in the distribution when we flip a coin). Identically distributed does not mean equiprobable. It is not required that the two random variables can only have the probability of 0.5 each or four random variables can only have the probability of 0.25 each in order for them to be i.i.d. The data generating process is the same for all observations (identically distributed). the observations are independent. In particular the order of the indexing (the order of the rows of the data table) can be considered to be arbitrary.

Answer 11

Attributes are independent and equally important

Answer 12

When the dimension of the feature set is high, making density estimation unattractive

Answer 13

prediction of a class label by means of the attribute

Answer 14

prediction of a numeric value by means of the attribute

Answer 15

``` m = attributes n = classes n^n^m = # possible trees ```

Answer 16

Bootstrapping is any test or metric that uses random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates.This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods

Answer 17

When you sort a set of data and divide it into equal parts so that each part contains the same number of values, these cut-off points are called quantiles.

Answer 18

When a set of data is divided into ten equal parts, each of them is called a decile.

Answer 19

In a hypothesis test, you would say that the significance level is the probability that we make the wrong decision and reject our H0.

Answer 20

The confidence level describes the probability that, if we would repeat an experiment multiple times, we would obtain the same results.