Semester 1 Flashcards

1
Q

What is a random variable?

A

A random variable is a variable whose numerical value is determined by chance, the outcome of a random phenomenon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between a discrete random variable and a continuous random variable?

A
  • Discrete random variable has a countable number of possible values
  • Continuous random variable can take on any value in an interval e.g. time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a standardised variable?

A

A standardised variable measures how many standard deviations X is above or below the mean. Standardised random variables always have a mean of 0 and a standard variation of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the central limit theorem state?

A

The central limit theorem states that if Z is a standardised sum of N independent, identically distributed (discrete or continuous) random variables with a finite, non-zero standard deviation, the probability distribution of Z approaches the normal distribution as N increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is statistical inference?

A

Using a sample to draw conclusions about the characteristics of the population from which it came from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a biased sample?

A

A sample that differs systematically from the population that it is intended to represent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is selection bias?

A

When a sample is biased because the selection of the sample systematically excludes or underrepresents certain groups. It often happens when we use a convenience sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a retrospective study?

A

A study that looks at past data for a contemporaneously selected sample. They may suffer from survivor bias: when we have to exclude members of the past population who are no longer around, by default. e.g. an examination of medical records of 65 year olds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a prospective study?

A

A study that selects a sample and then tracks members over time. They may suffer from non-response bias: the systematic refusal of some groups to participate in the experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a simple random sample?

A

A sample of size N taken from a given population in which each member of the population is equally likely to be included in the sample and every possible sample of size N from the population has an equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a parameter?

A

A characteristic of the population whose value is unknown but can be estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an estimator?

A

A sample statistic that will be used to estimate the value of the population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is sampling variation?

A

The notion that because samples are chosen randomly, the sample average will vary from sample to sample around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a sampling distribution?

A

The probability distribution that describes the population of all possible values of this statistic. Even if the population does not have a normal distribution, the sampling distribution of the sample mean will approach the normal distribution as the sample size increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an unbiased estimator?

A

A sample statistic is an unbiased estimator of a population parameter if the mean of the sampling distribution of this statistic is equal to the value of the population parameter. We can gauge the accuracy of the estimator by examining the size of its standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the t-distribution?

A

The sampling distribution of the variable that is created when the mean of a sample from a normal distribution is standardised using its standard error. The exact distribution of t depends on its sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is degrees of freedom?

A

The number of observations in the data that are free to vary when estimating statistical parameters.
Degrees of freedom = #observations - #estimated parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a confidence interval?

A

A confidence interval measures the reliability of a given statistic. It gives us a range to which we can say with a certain % confidence the true value of the population parameter lies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is econometrics?

A

The quantitative measurement and analysis of actual economic and business phenomena. It is used to describe economic reality, test hypotheses about economic theory and forecast future economic theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is regression analysis?

A

A statistical technique that attempts to explain movements in one variable (dependent) as a function of movements in a set of other variables (independent) through the quantification of a single equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is B0?

A

The intercept term also known as the constant. It is the value of Y if all other known independent variables are equal to 0

22
Q

What is B1?

A

The slope coefficient. The amount that Y will increase by when X increases by 1 unit holding all else constant.

23
Q

What are potential sources of variation in Y?

A
  1. Other potentially important explanatory variables
  2. Measurement error
  3. Incorrect functional form
  4. Purely random and unpredictable occurrences
24
Q

What is the stochastic error term?

A

The stochastic error term encompasses all other sources of variation in Y that are not captured by the model

25
Q

What is the deterministic component of a regression equation?

A

B0 + B1X ect. The expected value of Y given X. Also known as the conditional expectation: the expectation of Y given X

26
Q

How do we better the fit of a regression equation?

A

The smaller the estimated error term e (also known as the residual) the closer Y is to the observed value of Y so the better the fit

27
Q

What is OLS?

A

Ordinary least squares is an estimator that minimises the sum of the squared residuals/deviations of the vertical distance between the actual observed data and the estimated regression line

28
Q

What is the decomposition of variance?

A

The variation of Y around its mean (the TSS) can be decomposed into 2 parts: ESS (estimated value - mean) and RSS (actual value - estimated value)

29
Q

What is the coefficient of determination?

A

R^2 is the proportion of the variance in Y that can be explained by the model

30
Q

What is the adjusted coefficient of determination?

A

It is the share of the variation in Y around its mean that is explained by the regression equation adjusted for the degrees of freedom (N - K - 1)

31
Q

What is the simple correlation coefficient?

A

A measurement between -1 and 1 that shows the strength and liner direction of the relationship between two variables

32
Q

What are the 6 steps in applied regression analysis?

A
  1. Review literature and develop theoretical model
  2. Specify the model (independent variables and functional form)
  3. Hypothesise the expected signs of the coefficients
  4. Collect and inspect the data
  5. Estimate and evaluate the equation
  6. Document the results
33
Q

What is a dummy variable?

A

A variable that can only take.a value of 0 or 1 to represent the absence or presence of a certain characteristic (e.g biological gender)

34
Q

What is a cross-sectional data set?

A

A data set that observes many subjects at the same point in time

35
Q

What is a specification error?

A

An error in a statistical model due to at least one of the assumptions of the model being incorrect

36
Q

What is an omitted variable?

A

An important explanatory variable that is in the true regression model but is not in the estimated regression. An omitted variable causes omitted variable bias on the coefficient estimates

37
Q

What is an irrelevant variable?

A

An extra variable included in the estimated regression equation that does not belong there.

38
Q

What is specification bias?

A

Bias resulting from an omitted variable. When a variable is not included it cannot be held constant and this leads to bias of the coefficients

39
Q

What are the four important specification criteria?

A
  1. Theory
  2. T-test - is the variable’s estimated coefficient significant in the expected direction
  3. adjusted r squared
  4. bias - do other variables coefficients change significantly when the variable is added to the equation
40
Q

What is expected bias?

A

The likely bias the omitting a particular variable would have caused in the estimated coefficient of one of the included variables.

41
Q

What is a dummy variable?

A

An independent variable that can only take the value of 0 or 1 to indicate the absence or presence of a characteristic in an observation

42
Q

What is goodness of fit?

A

How well the observed data corresponds to the assumed model. We can examine the goodness of fit using the adjusted coefficient of determination

43
Q

What is a null hypothesis?

A

A statement that is assumed to be true until the researcher has significant statistical evidence to reject that it is true

44
Q

What is a t-test?

A

A t test is used to test hypotheses about individual regression slope coefficients.

45
Q

What is the f-test used for?

A
  1. Testing the overall significance of a model
  2. Testing the equivalence of regression coefficients between two sets
  3. Testing parameter stability (Chow test)
  4. Testing returns to scale
  5. Testing significance on seasonal dummies
46
Q

How do you test the overall significance of a model?

A

You use an f - test to compare the fit of the unconstrained equation to the fit of the constrained equation and compare it to the f critical value

47
Q

What is a Type 1 and Type 2 error?

A

A type 1 error is rejecting a true null hypothesis, the probability of committing a type 1 error is the level of significance
A type 2 error is not-rejecting a false null hypothesis

48
Q

What is a p value?

A

A p value is the marginal significance level, it is the probability of observing a t score that size or larger if the null hypothesis were true

49
Q

What are the 7 classical assumptions?

A
  1. The regression model is linear, correctly specified and has an additive error term
  2. Error term has a 0 population mean
  3. All explanatory variables are uncorrelated with the error term
  4. No serial correlation of the observations of the error term
  5. Constant variance in the error term (no heteroskedasticity)
  6. No perfect multicollinearity (one variable is not a linear multiple of another)
  7. The error term is normally distributed
50
Q

What does the Gauss-Markov Theorem state?

A

The Gauss-Markov Theorem states that given classical assumptions 1-6, the OLS estimator of -K is the minimum variance estimator from among the set of all linear unbiased estimators of -K.
OLS is the Best Linear Unbiased Estimator

51
Q

What is sensitivity analysis?

A

Purposely running a number of alternative specifications to determine whether particular results are robust

52
Q

What is data mining?

A

Exploring a data set to uncover empirical regularities that can inform economic theory