ECO 441 General Flashcards
Adjusted R2
The coefficient of determination with the inclusion of an additional regressor in an already estimated model. 1 - (1- R2) (n-1)/(n-k)
Regression through the origin
A regression model that doesn’t have any intercept parameter . It’s only a function of the slope parameter and the regressor
Perfect multicollinearity
Perfect multicollinearity occurs when one independent variable in a regression model can be perfectly predicted from the others. This situation makes it impossible to estimate the model using ordinary least squares and results in infinite standard errors for the affected coefficients. Perfect multicollinearity is often due to data entry errors or model misspecification. The issue can typically be resolved by removing one of the perfectly correlated variables from the model or by combining the correlated variables in some way.
Heteroscedasticity
Heteroscedasticity refers to the situation where the variability of a variable is unequal across the range of values of a second variable that predicts it. This violates the assumption of constant variance of error terms and is frequently observed in cross-sectional data. Heteroscedasticity leads to inefficient estimates and incorrect standard errors, which can result in misleading hypothesis tests and confidence intervals. It can be detected using visual methods or formal tests like the White test. Econometricians often address heteroscedasticity by using robust standard errors or employing weighted least squares estimation.
Autocorrelation
Autocorrelation occurs when the error terms in a regression model are correlated with each other over time or space. This violates the assumption of independence among error terms and is particularly common in time series data where patterns tend to persist over time. Autocorrelation can lead to inefficient estimates and unreliable standard errors, potentially compromising the validity of statistical inferences. It is often detected using tests such as the Durbin-Watson test, and addressing it may involve techniques like generalized least squares or including lagged variables in the model.
Parsimony principle
Built upon the fact that the summation of residuals must be equal to 0
Goodness of fit
A measure of how well a regression line fits into the model and observed data. It is given by the square of the correlation coefficient.
Goals of econometric research
The primary goals of econometric research are:
1. Analysis: To understand and explain economic phenomena, such as the relationship between variables, the impact of policy interventions, and the behavior of economic agents. Econometric analysis aims to identify patterns, trends, and correlations in economic data to draw meaningful conclusions.
2. Forecasting: To predict future economic outcomes, such as GDP growth, inflation rates, or stock prices. Econometric models are used to forecast future events, allowing policymakers and businesses to make informed decisions.
3. Policy making: To evaluate the effectiveness of economic policies and interventions. Econometric research helps policymakers assess the impact of policy changes, such as the effect of tax reforms or monetary policy decisions on economic outcomes. This informs evidence-based decision-making and improves policy design.
These goals are interconnected, as analysis informs forecasting, and both analysis and forecasting inform policy making. By achieving these goals, econometric research contributes to a deeper understanding of the economy and better decision-making.
Assumptions of CLRM
- Linearity in Parameters: The relationship between the dependent variable (Y) and the independent variables (X) is linear in the coefficients. This means the coefficients appear only to the first power (no squares, cubes, etc.) and are not multiplied by each other.
- Fixed regressors: The values of the independent variables (X) are assumed to be fixed in repeated sampling. In other words, X is treated as non-stochastic or non-random.
- Zero conditional mean: The disturbance term (U) is assumed to have a zero mean or expected value, given the values of the independent variables (X). This means that the model errors are not systematically biased.
- Homoscedasticity: The disturbance term (U) is assumed to have constant variance (σ²) for all observations, given the values of the independent variables (X). This implies that the conditional variances of U are identical across observations.
- No autocorrelation: The disturbances (U) are assumed to be uncorrelated with each other. Specifically, the covariance (or correlation) between any two disturbances (U_i and U_j) is zero, given the values of the independent variables (X_i and X_j). This assumption means that there is no serial correlation or autocorrelation in the error terms.6. The number of observations n must be greater than the number ofparameters to be estimated. In other words, the number ofobservations n must be greater than the number of explanatoryvariables.
- There must be variability in X values. The X values in a givensample must not all be the same. Technically, Var(X) must be a finite positive number. If all the X values are identical, then
Xi = _X - The regression model is correctly specified. Alternatively, there isno specification bias or error in the model used in empirical analysis. The classical econometric methodology assumes implicitly, if not explicitly, that the model used to test an economic theory is “correctly specified”. An econometric investigation begins with the specification of the econometric model; underlying the phenomenon of interest. 9.It is also assumed that there is no perfect multicollinearity. This means that there is no perfect linear relationship among the explanatory variables.10. Homoscedasticity: Ther must be constant or equal variance of the error term over repeated sampling
These assumptions are crucial for the validity and efficiency of the classical linear regression model and the least-squares estimators. Violations of these assumptions can lead to biased, inefficient, or inconsistent parameter estimates, and may require the use of alternative estimation methods or corrective measures.
What is a OLS?
Ordinary least squares (OLS) or linear least squares is a statistical technique used to estimate the unknown coefficients in a linear regression model. This method of regression analysis is propounded by a German Mathematician called Carl Friedrich Gauss.
Significance of the Stochastic disturbance term
The error term, denoted by the Greek letter mu (μ), occupies a critical position within econometric models. Its inclusion serves several important purposes:
1. Parsimony and Unobserved Variables: Economic theory may not comprehensively capture all the determinants influencing the dependent variable (Y). The error term (μ) acts as a receptacle for these omitted variables, enabling the construction of a parsimonious model that retains functionality. This is particularly relevant when theoretical frameworks remain incomplete or when data collection for certain variables proves impractical.
2. Functional Form Uncertainty: Even when the theory identifies the relevant explanatory variables (X), the precise functional form of the relationship between Y and X might be ambiguous. Is it a linear association, or does it exhibit a curvilinear pattern? The error term (μ) incorporates this uncertainty in the functional form, mitigating potential biases in model estimates.
3. Measurement Error and Missing Data: In some instances, researchers may recognize the existence of additional relevant variables, but data limitations might impede their inclusion. These limitations could stem from the inherent difficulty of measuring certain variables or the absence of readily available data. The error term (μ) serves as a proxy for these missing data points, acknowledging their potential influence on the dependent variable.
4. Inherent Randomness and Behavioral Vagueness: Economic theories, by their nature, may struggle to perfectly capture the intricacies of human behavior. E.g, While the theory might posit a relationship between income (X) and consumption (Y), there could be numerous unobserved factors influencing consumption decisions. The error term (μ) accommodates this inherent randomness and acknowledges the limitations of economic theory in fully explaining human behavior.
5. Intrinsic Randomness: Even with a meticulously constructed model, there will always be some degree of inherent randomness or stochasticity in real-world data (Y) that defies complete explanation. The error term (μ) encapsulates this unavoidable element of chance, recognizing that not all variations in Y can be attributed solely to the included explanatory variables.
- Measurement Error and Proxy Variables: Econometric analysis often relies on proxy variables, which serve as imperfect surrogates for the true theoretical concepts under investigation. For instance, studying the impact of permanent income on permanent consumption necessitates employing current income and consumption as proxies. The error term (μ) absorbs the measurement errors associated with using these proxies, acknowledging the potential discrepancies between the observed and true variables.
stochastic term
The stochastic term, often represents all those variables that are left out in the model but that collectively affect Y. It Accounts for the inherent randomness or unexplained variation in the dependent variable. Its inclusion ensures that the model is more realistic, flexible, and capable of representing the inherent complexities and uncertainties in economic relationships.
Cross sectional data
Cross-sectional data refers to data collected at a single point in time, across different individuals, households, firms, or other units of observation. In cross-sectional data, each observation represents a unique entity.
Examples:
A survey of household incomes in a given year Data on firm characteristics (size, industry, profits) for a specific year
Key characteristics:
No time dimension, data is collected at one specific point in time Allows for analysis of differences across entities at a given point in time
Time series data
Time Series Data: Time series data refers to data collected over multiple time periods for a single entity or a group of entities. In time series data, each observation represents a specific time period (e.g., year, quarter, month).
Examples:
Annual GDP data for a country over several years Monthly stock prices for a particular company
Key characteristics:
Observations are ordered over time Allows for analysis of trends, patterns, and dynamics over time
Pooled data
Pooled data is a combination of cross-sectional data and time series data. It means that you have observations for multiple individuals or entities (cross-sectional dimension) over multiple time periods (time series dimension).
For example, let’s say you have data on the income of 100 households for the years 2020, 2021, and 2022. In this case, you have cross-sectional data on 100 households, and time series data for 3 years.
When you combine these two dimensions, you get a pooled data set. So, for each household, you have income observations for multiple years.
Panel data
Panel data, also known as longitudinal data, is a type of pooled data where the same entities (individuals, households, firms, etc.) are observed over multiple time periods. In panel data, the observations for each entity are not independent, as they are linked through the entity’s unique identifier over time.
Examples:
Data on household incomes for the same set of households over several years Data on firm characteristics for the same firms over multiple time periods
Key characteristics:
Observations are collected on the same entities over time Allows for analysis of changes within entities over time, as well as differences across entities
Panel data can be further categorized into:
Balanced panel: All entities have observations for the same time periods Unbalanced panel: Some entities have missing observations for certain time periods
Data types
cross-sectional, time series and pooled (a combination of time series and cross-sectional data).
Observational data
Observational data, on the other hand, are collected through observations of individuals, households, firms, or other entities in their natural environments without any intervention or manipulation by the researcher. In observational studies, the researcher does not control the assignment of individuals to different groups or the values of the independent variables.
Experimental data
Experimental data are obtained from controlled experiments, where the researcher has the ability to manipulate one or more independent variables (treatment variables) and observe their effect on the dependent variable(s). In experimental studies, participants are randomly assigned to different treatment groups, and the researcher controls as many factors as possible to isolate the causal effect of the treatment variable(s).
Correlation
Correlation analysis aims to measure the degree of association between variables, irrespective of whether they are dependent or explanatory variables. It does not distinguish between dependent and independent variables. Correlation analysis simply quantifies the strength and direction of the linear relationship between two variables.
The correlation coefficient (r) ranges from -1 to 1, where:
r = 1 indicates a perfect positive linear relationship r = -1 indicates a perfect negative linear relationship r = 0 indicates no linear relationship
Correlation analysis is symmetric, meaning the correlation between X and Y is the same as the correlation between Y and X.
Causation
Causation refers to the idea that one variable (the cause) directly influences or determines the value of another variable (the effect). To establish causation, researchers typically need to satisfy three main criteria:
1. Temporal precedence: The cause must precede the effect in time. 2. Covariation: There must be a systematic relationship (correlation) between the cause and the effect. 3. Elimination of alternative explanations: Other plausible explanations for the observed relationship must be ruled out.
Regression
Regression analysis is a method of fitting a mathematical model to a set of data, with the goal of quantifying the relationship between a dependent variable and one or more independent variables. The regression model describes the average relationship between the variables, allowing researchers to predict the value of the dependent variable based on the values of the independent variables.
The regression equation takes the form:
Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ + u
Where:
Y is the dependent variable X₁, X₂, …, Xₙ are the independent variables β₀ is the constant or intercept term β₁, β₂, …, βₙ are the regression coefficients, representing the change in Y associated with a unit change in the corresponding independent variable, holding other variables constant ε is the error term, accounting for the variability in Y not explained by the independent variables
Deterministic relationship
A deterministic relationship, is one where the values of the dependent variable are completely determined by the values of the independent variables, without any randomness or unexplained variability. In a deterministic relationship, the dependent variable is a perfect function of the independent variables, and there is no error term or random component.
In a deterministic relationship, the equation takes the form:
Y = f(X₁, X₂, …, Xₙ)
Where f(·) is a deterministic function that maps the values of the independent variables to the values of the dependent variable without any residual error.
Statistical Relationships
A statistical relationship is one where the variables are related on average or through a probability distribution. It means that the relationship between the variables is not exact or determined with certainty, but rather, there is a tendency or pattern in the data. In a statistical relationship, the values of the dependent variable are not perfectly determined by the values of the independent variables, and there is always some degree of variability or randomness present.
In a statistical relationship, the regression equation takes the form:
Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ + ε
Where ε represents the random error term, accounting for the variability in Y that cannot be explained by the independent variables.