Econometrics Flashcards
Acceptance region
The set of values of a test statistic for which the null hypothesis is accepted (is not rejected).
Adjusted R2( )
A modified version of R2 that does not necessarily increase when a new regressor is added to the regression.
ADL(p,q)
See autoregressive distributed lag model.
AIC
See information criterion.
Akaike information criterion
See information criterion.
Alternative hypothesis
The hypothesis that is assumed to be true if the null hypothesis is false. The alternative hypothesis is often denoted H1.
AR(p)
See autoregression.
ARCH
See autoregressive conditional heteroskedasticity.
Asymptotic distribution
The approximate sampling distribution of a random variable computed using a large sample. For example, the asymptotic distribution of the sample average is normal.
Asymptotic normal distribution
A normal distribution that approximates the sampling distribution of a statistic computed using a large sample.
Attrition
The loss of subjects from a study after assignment to the treatment or control group.
Augmented Dickey-Fuller (ADF) test
A regressionbased test for a unit root in an AR(p) model.
Autocorrelation
The correlation between a time series variable and its lagged value.The jth autocorrelation of Y is the correlation between Yt and Yt2j.
Autocovariance
The covariance between a time series variable and its lagged value.The jth autocovariance of Y is the covariance between Yt and Yt2j.
Autoregression
A linear regression model that relates a time series variable to its past (that is, lagged) values. An autoregression with p lagged values as regressors is denoted AR(p).
Autoregressive conditional heteroskedasticity (ARCH)
A time series model of conditional heteroskedasticity. R2
Autoregressive distributed lag model
A linear regression model in which the time series variable Yt is expressed as a function of lags of Yt and of another variable, Xt.The model is denoted ADL(p,q), where p denotes the number of lags of Yt and q denotes the number of lags of Xt.
Average causal effect
The population average of the individual causal effects in a heterogeneous population. Also called the average treatment effect.
Balanced panel
A panel data set with no missing observations, that is, in which the variables are observed for each entity and each time period.
Base specification
A baseline or benchmark regression specification that includes a set of regressors chosen using a combination of expert judgment, economic theory, and knowledge of how the data were collected.
Bayes information criterion
See information criterion.
Bernoulli distribution
The probability distribution of a Bernoulli random variable.
Bernoulli random variable
A random variable that takes on two values, 0 and 1.
Best linear unbiased estimator
An estimator that has the smallest variance of any estimator that is a linear function of the sample values Y and is unbiased. Under the Gauss-Markov conditions, the OLS estimator is the best linear unbiased estimator of the regression coefficients conditional on the values of the regressors.
Bias
The expected value of the difference between an estimator and the parameter that it is estimating. If is an estimator of mY, then the bias of is E( )2 mY.
BIC
See information criterion.
Binary variable
A variable that is either 0 or 1.A binary variable is used to indicate a binary outcome. For example,X is a binary (or indicator, or dummy) variable for a person’s gender if X 5 1 if the person is female and X 5 0 if the person is male. mˆY mˆY mˆY
Bivariate normal distribution
A generalization of the normal distribution to describe the joint distribution of two random variables.
BLUE
See best linear unbiased estimator.
Break date
The date of a discrete change in population time series regression coefficient(s).
Causal effect
The expected effect of a given intervention or treatment as measured in an ideal randomized controlled experiment.
Central limit theorem
A result in mathematical statistics that says that, under general conditions, the sampling distribution of the standardized sample average is well approximated by a standard normal distribution when the sample size is large.
Chi-squared distribution
The distribution of the sum of m squared independent standard normal random variables.The parameter m is called the degrees of the freedom of the chi-squared distribution.
Chow test
A test for a break in a time series regression at a known break date.
Coefficient of determination
See R2.
Cointegration
When two or more time series variables share a common stochastic trend.
Common trend
A trend shared by two or more time series.
Conditional distribution
The probability distribution of one random variable given that another random variable takes on a particular value.
Conditional expectation
The expected value of one random value given that another random variable takes on a particular value.
Conditional heteroskedasticity
The variance, usually of an error term, depends on other variables.
Conditional mean
The mean of a conditional distribution; see conditional expectation.
Conditional mean independence
The conditional expectation of the regression error ui, given the regressors, depends on some but not all of the regressors.
Conditional variance
The variance of a conditional distribution.
Confidence interval (or confidence set)
An interval (or set) that contains the true value of a population parameter with a prespecified probability when computed over repeated samples.
Confidence level
The prespecified probability that a confidence interval (or set) contains the true value of the parameter.
Consistency
Means that an estimator is consistent. See consistent estimator.
Consistent estimator
An estimator that converges in probability to the parameter that it is estimating.
Constant regressor
The regressor associated with the regression intercept; this regressor is always equal to 1.
Constant term
The regression intercept.
Continuous random variable
A random variable that can take on a continuum of values.
Control group
The group that does not receive the treatment or intervention in an experiment.
Control variable
Another term for a regressor; more specifically, a regressor that controls for one of the factors that determine the dependent variable.
Convergence in distribution
When a sequence of distributions converges to a limit; a precise definition is given in Section 17.2.
Convergence in probability
When a sequence of random variables converges to a specific value; for example, when the sample average becomes close to the population mean as the sample size increases; see Key Concept 2.6 and Section 17.2.
Correlation
A unit-free measure of the extent to which two random variables move, or vary, together.The correlation (or correlation coefficient) between X and Y is sXY/sXsY and is denoted corr(X,Y).
Correlation coefficient
See correlation.
Covariance
A measure of the extent to which two random variables move together.The covariance between X and Y is the expected value E[(X 2 mX)(Y 2 mY)], and is denoted by cov(X,Y) or by sXY.
Covariance matrix
A matrix composed of the variances and covariances of a vector of random variables.
Critical value
The value of a test statistic for which the test just rejects the null hypothesis at the given significance level.
Cross-sectional data
Data collected for different entities in a single time period.
Cubic regression model
A nonlinear regression function that includes X, X2, and X3 as regressors.
Cumulative distribution function (c.d.f.)
See cumulative probability distribution.
Cumulative dynamic multiplier
The cumulative effect of a unit change in the time series variable X on Y.The h-period cumulative dynamic multiplier is the effect of a unit change in Xt on Yt + Yt+1+ . . . + Yt+h.
Cumulative probability distribution
A function showing the probability that a random variable is less than or equal to a given number.
Dependent variable
The variable to be explained in a regression or other statistical model; the variable appearing on the left-hand side in a regression.
Deterministic trend
A persistent long-term movement of a variable over time that can be represented as a nonrandom function of time.
Dickey-Fuller test
A method for testing for a unit root in a first order autoregression [AR(1)].
Differences estimator
An estimator of the causal effect constructed as the difference in the sample average outcomes between the treatment and control groups.
Differences-in-differences estimator
The average change in Y for those in the treatment group, minus the average change in Y for those in the control group.
Discrete random variable
A random variable that takes on discrete values.
Distributed lag model
A regression model in which the regressors are current and lagged values of X.
Dummy variable
See binary variable.
Dummy variable trap
A problem caused by including a full set of binary variables in a regression together with a constant regressor (intercept), leading to perfect multicollinearity.
Dynamic causal effect
The causal effect of one variable on current and future values of another variable.
Dynamic multiplier
The h-period dynamic multiplier is the effect of a unit change in the time series variable Xt on Yt+h.
Endogenous variable
A variable that is correlated with the error term.
Error term
The difference between Y and the population regression function, denoted by u in this textbook.
Errors-in-variables bias
The bias in an estimator of a regression coefficient that arises from measurement errors in the regressors.
Estimate
The numerical value of an estimator computed from data in a specific sample.
Estimator
A function of a sample of data to be drawn randomly from a population. An estimator is a procedure for using sample data to compute an educated guess of the value of a population parameter, such as the population mean.
Exact distribution
The exact probability distribution of a random variable.
Exact identification
When the number of instrumental variables equals the number of endogenous regressors.
Exogenous variable
A variable that is uncorrelated with the regression error term.
Expected value
The long-run average value of a random variable over many repeated trials or occurrences. It is the probability-weighted average of all possible values that the random variable can take on.The expected value of Y is denoted E(Y) and is also called the expectation of Y.
Experimental data
Data obtained from an experiment designed to evaluate a treatment or policy or to investigate a causal effect.
Experimental effect
When experimental subjects change their behavior because they are part of an experiment.
Explained sum of squares (ESS)
The sum of squared deviations of the predicted values of Yi, ,from their average; see Equation (4.14).
Explanatory variable
See regressor.
External validity
Inferences and conclusions from a statistical study are externally valid if they can be generalized from the population and the setting studied to other populations and settings.
F-statistic
A statistic used to a test joint hypothesis concerning more than one of the regression coefficients.
Fm,n distribution
The distribution of a ratio of independent random variables, where the numerator is a chi-squared random variable with m degrees of freedom, divided by m, and the denominator is a chi-squared random variable with n degrees of freedom divided by n.
Fm,∞ distribution
The distribution of a random variable with a chi-squared distribution with m degrees of freedom, divided by m.
Feasible GLS
A version of the generalized least squares (GLS) estimator that uses an estimator of the conditional variance of the regression errors and covariance between the regression errors at different observations.
Feasible WLS
A version of the weighted least squares (WLS) estimator that uses an estimator of the conditional variance of the regression errors.
First difference
The first difference of a time series variable Yt is Yt 2 Yt21, denoted DYt.
First-stage regression
The regression of an included endogenous variable on the included exogenous variables, if any, and the instrumental variable(s) in two stage least squares.
Fitted values
See predicted values.
Fixed effects
Binary variables indicating the entity or time period in a panel data regression.
Fixed effects regression model
A panel data regression that includes entity fixed effects. ˆYi
Forecast error
The difference between the value of the variable that actually occurs and its forecasted value.
Forecast interval
An interval that contains the future value of a time series variable with a prespecified probability.
Functional form misspecification
When the form of the estimated regression function does not match the form of the population regression function; for example, when a linear specification is used but the true population regression function is quadratic.
GARCH
See generalized autoregressive conditional heteroskedasticity.
Gauss-Markov theorem
Mathematical result stating that, under certain conditions, the OLS estimator is the best linear unbiased estimator of the regression coefficients conditional on the values of the regressors.
Generalized autoregressive conditional heteroskedasticity
A time series model for conditional heteroskedasticity.
Generalized least squares (GLS)
A generalization of OLS that is appropriate when the regression errors have a known form of heteroskedasticity (in which case GLS is also referred to as weighted least squares, WLS) or a known form of serial correlation.
Generalized method of moments
A method for estimating parameters by fitting sample moments to population moments that are functions of the unknown parameters. Instrumental variables estimators are an important special case.