Topics 15-19 Flashcards

1
Q

Probability of independent, dependent and mutually exclusive events, formulae

A

The probability of an independent event is unaffected by the occurrence of other events, but the probability of a dependent event is changed by the occurrence of another event.

  • Events A and B are independent if and only if: P(A | B) = P(A), or equivalently, P(B | A) = P(B)
  • The probability that at least one of two events will occur is P(A or B) = P(A) + P(B) — P(AB).
  • For mutually exclusive events, P(A or B) = P(A) + P(B), since P(AB) = 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Joint probability of two events

A

The joint probability of two events, P(AB), is the probability that they will both occur.

P(AB) = P(A | B) x P(B)

For independent events, P(A | B) = P(A), so that

P(AB) = P(A) xP(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential statistics, defenition

A

Inferential statistics pertain to the procedures used to make forecasts, estimates, or judgments about a large set of data on the basis of the statistical characteristics of a smaller set (a sample).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode, definition

A

The mode is the value that occurs most frequently in a data set. A data set may have more than one mode or even no mode.

When a distribution has one value that appears most frequently, it is said to be unimodal. When a set of data has two or three values that occur most frequently, it is said to be bimodal or trimodal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Geometric mean comparing to arithmetic mean

A

The geometric mean is always less than or equal to the arithmetic mean, and the difference increases as the dispersion of the observations increases.

The only time the arithmetic and geometric means are equal is when there is no variability in the observations (i. e., all observations are equal).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Expected value and some its properties

A

The expected value is the weighted average of the possible outcomes of a random variable, where the weights are the probabilities that the outcomes will occur.

Some properties:

  • If X and Y are independent random variables, then E(XY) = E(X) x E(Y)
  • If X and Y are NOT independent, then E(XY) ≠ E(X) x E(Y)
  • E(XY) = E(X)*E(Y) + covariance(X,Y)
  • If X is a random variable, then E(X2) ≠ [E(X)]2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variance and some of its properties

A

Variance is defined as:

Var(X) = E [(X - μ)2]

Some properties:

  • Var(X) = E[(X - μ)2] = E(X2) - [E(X)]2 where μ = E(X)
  • If a and c are constants, then:
    Var(aX + c) = a2 x Var(X)
  • If X and Y are independent random variables, then:
    Var(X + Y) = Var(X) + Var(Y)
    Var(X - Y) = Var(X) + Var(Y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Covariance and some of its properties

A

Covariance is the expected value of the product of the deviations of the two random variables from their respective expected values.

Since we will be mostly concerned with the covariance of asset returns, the following formula has been written in terms of the covariance of the return of asset i, Ri, and the return of asset j, Rj

Cov(Ri,Rj) = E{[Ri - E(Ri)] [Rj - E(Rj)]}

This equation simplifies to:

Cov(Ri,Rj) = E(Ri*Rj) - E(Ri)xE(Rj)

Some properties:

  • If a, b, c, and d are constants, then:
    Cov(a + bX, c + dY) = b x d x Cov(X,Y)
  • Cov(Z, aX + bY) = a Cov(Z,X) + b Cov(Z,Y)
  • If X and Y are NOT independent, then:
    Var(X + Y) = Var(X) + Var(Y) + 2 x Cov(X,Y)
    Var(X - Y) = Var(X) + Var(Y) - 2 x Cov(X,Y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Correlation coefficient, definition

A

To make the covariance of two random variables easier to interpret, it may be divided by the product of the random variables’ standard deviations. The resulting value is called the correlation coefficient, or simply, correlation.

The relationship between covariances, standard deviations, and correlations can be seen in the following expression for the correlation of the returns for asset i and j:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Central moments (skewness, kurtosis)

A

Central moments are measured relative to the mean (i.e., central around the mean). The k-th central moment is defined as:

E(R-μ)k = Σi=1n pi(Ri - μ)k

where: pi is probability of event i, Ri is return associated with event i.
1. Since central moments are measured relative to the mean, the first central moment equals zero and is, therefore, not typically used.
2. The second central moment is the variance of the distribution, which measures the dispersion of data.
3. The third central moment measures the departure from symmetry in the distribution. This moment will equal zero for a symmetric distribution (such as the normal distribution).

The skewness statistic is the standardized third central moment. Skewness (sometimes called relative skewness) refers to the extent to which the distribution of data is not symmetric around its mean. It is calculated as:
skewness = E[(R - μ)3]/σ3

  1. The fourth central moment measures the degree of clustering in the distribution.

The kurtosis statistic is the standardized fourth central moment of the distribution. Kurtosis refers to the degree of peakedness or clustering in the data distribution and is calculated as:
kurtosis = E[(R - μ)<span>4</span>]/σ4

Kurtosis for the normal distribution equals 3. Therefore, the excess kurtosis for any distribution equals:

excess kurtosis = kurtosis - 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Effect of Skewness on Mean, Median, and Mode

A

The key to remembering how measures of central tendency are
affected by skewed data is to recognize that skew affects the mean more than the median and mode, and the mean is “pulled” in the direction of the skew.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Properties of kurtosis

A

Kurtosis is a measure of the degree to which a distribution is more or less “peaked” than a normal distribution. Leptokurtic describes a distribution that is more peaked than a normal distribution, whereas platykurtic refers to a distribution that is less peaked (or flatter) than a normal distribution. A distribution is mesokurtic if it has the same kurtosis as a normal distribution.

A distribution is said to exhibit excess kurtosis if it has either more or less kurtosis than the normal distribution: the Poisson distribution has mean and variance equal to lambda, λ, and its excess kurtosis is elegantly given 1/λ such that it always has (slightly) heavy tails; the student’s t also has slightly heavy tails with excess kurtosis given by df/(df-2) when df > 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Coskewness and Cokurtosis

A

Previously, we identified moments and central moments for mean and variance. In a similar fashion, we can identify cross central moments for the concept of covariance. The third cross central moment is known as coskewness and the fourth cross central moment is known as cokurtosis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Desirable statistical properties of an estimator, The Best Linear Unbiased Estimator

A

There are certain statistical properties that make some estimates more desirable than others. These desirable properties of an estimator are unbiasedness, efficiency, consistency, and linearity.

  • An unbiased estimator is one for which the expected value of the estimator is equal to the parameter you are trying to estimate. For example, because the expected value of the sample mean is equal to the population mean [E(x̅)=μ], the sample mean is an unbiased estimator of the population mean.
  • An unbiased estimator is also efficient if the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate. The sample mean, for example, is an unbiased and efficient estimator of the population mean.
  • A consistent estimator is one for which the accuracy of the parameter estimate increases as the sample size increases. As the sample size increases, the sampling distribution bunches more closely around the population mean.
  • A point estimate is a linear estimator when it can be used as a linear function of sample data.

If the estimator is the best available (i.e., has the minimum variance), exhibits linearity, and is unbiased, it is said to be the best linear unbiased estimator (BLUE).

** Desirable statistical properties of an estimator include unbiasedness (sign of estimation error is random), efficiency (lower sampling error than any other unbiased estimator), consistency (variance of sampling error decreases with sample size), and linearity (used as a linear function of sample data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Parametric and nonparametric distributions

A

Probability distributions are classified into two categories: parametric and nonparametric.

Parametric distributions, such as a normal distribution, can be described by using a mathematical function. These types of distributions make it easier to draw conclusions about the data; however, they also make restrictive assumptions, which are not necessarily supported by real-world patterns.

Nonparametric distributions, such as a historical distribution, cannot be described by using a mathematical function. Instead of making restrictive assumptions, these types of distributions fit the data perfectly; however, without generalizing the data, it can be difficult for a researcher to draw any conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Uniform distribution, its mean and variance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bernoulli distribution

A

A Bernoulli distributed random variable only has two possible outcomes. The outcomes can be defined as either a “success” or a “failure.” The probability of success, p, may be denoted with the value “1” and the probability of failure, 1 - p , may be denoted with the value “0”.

Variance = p*(1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Binominal distribution

A

A binomial random variable may be defined as the number of “successes” in a given number of trials, whereby the outcome can be either “success” or “failure.”

Binomial tends to normal as n increases. It is a discrete distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Expected Value and Variance of a Binomial Random Variable

A

For a given series of n trials, the expected number of successes, or E(X), is given by the following formula:

expected value of X = E(X) = np

The intuition is straightforward; if we perform n trials and the probability of success on each trial is p , we expect np successes.

The variance of a binomial random variable is given by:
variance of X = np(l — p) = npq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The Poisson Distribution

A

Poisson tends to normal as lambda increases. It is a discrete distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Normal distribution and its density function

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Confidence Intervals for a Normal Distribution

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The Standard Normal Distribution

A

A standard normal distribution (i.e., z-distribution) is a normal distribution that has been standardized so it has a mean of zero and a standard deviation of 1 [i.e., N~(0,1)].

To standardize an observation from a given normal distribution, the z-value of the observation must be calculated.

The z-value represents the number of standard deviations a given observation is from the population mean. Standardization is the process of converting an observed value for a random variable to its z-value. The following formula is used to standardize a random variable:

24
Q

The Lognormal Distribution

A
  1. Distribution of Stock Price:

(lnS + (µ - σ2/2) * t; σ2 * t)

>>> the left-hand side, lnS + (µ - σ2/2) * t, gives you the expected mean whereas ,σ2 * t, gives you the variance

  1. Distribution of Rate of Return:

(µ - σ2/2; σ2/T)

>>> here again: left-hand side yields the mean, right-hand side the variance

25
Q

The Central Limit Theorem and its important properties

A

The central limit theorem is extremely useful because the normal distribution is relatively easy to apply to hypothesis testing and to the construction of confidence intervals.

Specific inferences about the population mean can be made from the sample mean, regardless of the populations distribution, as long as the sample size is “sufficiently large,” which usually means n > 30.

Important properties of the central limit theorem include the following:

  • If the sample size n is sufficiently large (n > 30), the sampling distribution of the sample means will be approximately normal.
  • The mean of the population, μ, and the mean of the distribution of all possible sample means are equal.
  • The variance of the distribution of sample means is σ2/n, the population variance divided by the sample size.
26
Q

Student’s t-distribution and its properties

A

Student’s t-distribution, or simply the t-distribution, is a bell-shaped probability distribution that is symmetrical about its mean. It is the appropriate distribution to use when constructing confidence intervals based on small samples (n < 30) from populations with unknown variance and a normal, or approximately normal, distribution.

It may also be appropriate to use the t-distribution when the population variance is unknown and the sample size is large enough that the central limit theorem will assure that the sampling distribution is approximately normal.

Student’s t-distribution has the following properties:

  • It is symmetrical.
  • It is defined by a single parameter, the degrees of freedom (df), where the degrees of freedom are equal to the number of sample observations minus 1, n — 1, for sample means.
  • It has more probability in the tails (fatter tails) than the normal distribution.
  • As the degrees of freedom (the sample size) gets larger, the shape of the t-distribution more closely approaches a standard normal distribution.

When compared to the normal distribution, the t-distribution is flatter with more area under the tails (i.e., it has fatter tails). As the degrees of freedom for the t-distribution increase, however, its shape approaches that of the normal distribution.

Practically speaking, the greater the degrees of freedom, the greater the percentage of observations near the center of the distribution and the lower the percentage of observations in the tails, which are thinner as degrees of freedom increase. This means that confidence intervals for a random variable that follows a t-distribution must be wider (narrower) when the degrees of freedom are less (more) for a given significance level.

! variance = df/(df-2) !

Interesting facts:

  • To test of significance of a single partial slope coefficient in a (sample) multiple regression with three independent variables (aka, regressors), we use a critical t with degrees of freedom (d.f) equal to the sample size minus four (n - 4)
  • The student’s t distribution (with m degrees of freedom) is the distribution of the ratio of a standard normal random variable divided by the square root of an independently distributed chi-squared random variable with (m) degrees of freedom divided by (m)
  • The t-distribution occurs as a function of other random variables, namely the normal and the Chi-square distribution. If X is a standard normal random variable and Z a Chi-square distributed random variable with n degrees of freedom which is independent of X, then by definition the distribution of the random variable Y defined as Y = X/SQRT(Z/n) possesses a t-distribution with n degrees of freedom.
27
Q

Chi-Square distribution

A

Hypothesis testing of the population variance requires the use of a chi-squared distributed test statistic, denoted χ2.

The chi-square distribution is asymmetrical, bounded below by zero, and approaches the normal distribution in shape as the degrees of freedom increase.

The chi-squared test compares the test statistic to a critical chi-squared value at a given level of significance to determine whether to reject or fail to reject a null hypothesis.

Some interesting properties of chi-squared distribution:

  • it has non-zero mean;
  • the mean of the distribution is equal to the number of degrees of freedom;
  • the variance is equal to two times the number of degrees of freedom.
  • the sum of two independent chi-squared variables, with respectively k1 and k2 degrees of freedom, is itself chi-squared with (k1+k2) degrees of freedom
  • both the chi-squared and F-distribution are non-negative and positively skewed to the right
28
Q

F-distribution

A

The hypotheses concerned with the equality of the variances of two populations are tested with an F-distributed test statistic. Hypothesis testing using a test statistic that follows an F-distribution is referred to as the F-test.

The F distribution is also used to test the joint hypothesis that the partial slope coefficients in a multiple regression are significant; i.e., is the overall multiple regression significant?

The F-test is used under the assumption that the populations from which samples are drawn are normally distributed and that the samples are independent.

The test statistic for the F-test is the ratio of the sample variances. The F-statistic is computed as:

F = s12/s22

where:
s12 = variance of the sample of n1 observations drawn from Population 1
s22 = variance of the sample of n2 observations drawn from Population 2

The shape of the F-distribution is determined by two separate degrees of freedom, the numerator degrees of freedom, df1, and the denominator degrees of freedom, df2.

Some additional properties of the F-distribution include the following:

  • The F-distribution approaches the normal distribution as the number of observations increases (just as with the t-distribution and chi-squared distribution).
  • A random variable’s t-value squared (t2) with n — 1 degrees of freedom is F-distributed with 1 degree of freedom in the numerator and n — 1 degrees of freedom in the denominator.
  • There exists a relationship between the F- and chi-squared distributions such that:

F = χ2/ (# of observations in numerator)

as the # of observations in denominator → ∞

  • F ratio is linked with R2:

F=(R2/k)/[(1-R2)/(N-k-1)]

29
Q

Mixture distributions

A

To illustrate a mixture distribution, suppose that the returns of a stock follow a normal distribution with low volatility 75% of the time and high volatility 25% of the time. Here we have two normal distributions with the same mean, but different risk levels. To create a mixture distribution from these scenarios, we randomly choose either the low or high volatility distribution, placing a 75% probability on selecting the low volatility distribution. We then generate a random return from the selected distribution. By repeating this process several times, we will create a probability distribution that reflects both levels of volatility.

By mixing distributions, it is easy to see how we can alter skewness and kurtosis of the component distributions. Skewness can be changed by combining distributions with different means, and kurtosis can be changed by combining distributions with different variances. Also, by combining distributions that have significantly different means, we can create a mixture distribution with multiple modes (e.g., a bimodal distribution).

30
Q

Bayes’ theorem

A

Bayes’ theorem for two random variables A and B is defined as follows:

P(A|B) = P(B|A)xP(A)/P(B)

The conditional probability is read as the probability of event A occurring, given that event B has already occurred.

P(AB) = P(A|B) x P(B)
P(AB) = P(BIA) x P(A)
31
Q

Bayesian approach vs. Frequentist approach

A
  • The frequentist approach involves drawing conclusions from sample data based on the frequency of that data. For example, the approach suggests that the probability of a positive event will be 100% if the sample data consists of only observations that are positive events. The primary difference between the Bayesian approach and the frequentist approach is that the Bayesian approach is instead based on a prior belief regarding the probability of an event occurring.
  • The Bayesian approach requires a beginning assumption regarding probabilities. These prior assumptions are often based on a frequentist approach (i.e., number of events occurring during a sample period) or some other subjective analysis.
  • With small sample sizes the Bayesian approach is often used in practice. With larger sample sizes, most analysts tend to use the frequentist approach. The frequentist approach is also often used because it is easier to implement and understand than the Bayesian approach.
32
Q

Sample variance, definition

A

The sample variance, s2, is the measure of dispersion that applies when we are evaluating a sample of n observations from a population. The sample variance is calculated using the following formula:

33
Q

Population and sample covariances, definition

A

The population and sample covariances are calculated as:

34
Q

Confidence intervals

A

Confidence interval estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1 — α . Here, alpha, α , is called the level of significance for the confidence interval, and the probability 1 — α is referred to as the degree of confidence. For example, we might estimate that the population mean of random variables will range from 15 to 25 with a 95% degree of confidence, or at the 5% level of significance.

Confidence intervals are usually constructed by adding or subtracting an appropriate value from the point estimate. In general, confidence intervals take on the following form:

point estimate ± (reliability factor x standard error)

where:

  • point estimate = value of a sample statistic of the population parameter
  • reliability factor = number that depends on the sampling distribution of the point estimate and the probability that the point estimate falls in the confidence interval, (1 — α)
  • standard error = standard error of the point estimate
35
Q

Confidence interval for normal distribution with a known variance

A

If the population has a normal distribution with a known variance, a confidence interval for the population mean can be calculated as:

36
Q

Criteria for Selecting the Appropriate Test Statistic

A

If we are sampling from a nonnormal distribution (which is sometimes the case in finance), we can not create a confidence interval if the sample size is less than 30. So, all else equal, make sure you have a sample of at least 30, and the larger, the better.

All of the preceding analysis depends on the sample we draw from the population being random. If the sample isn’t random, the central limit theorem doesn’t apply, our estimates won’t have the desirable properties, and we can’t form unbiased confidence intervals.

Surprisingly, creating a random sample is not as easy as one might believe. There are a number of potential mistakes in sampling methods that can bias the results. These biases are particularly problematic in financial research, where available historical data are plentiful, but the creation of new sample data by experimentation is restricted.

37
Q

Hypothesis testing

A

Hypothesis testing is the statistical assessment of a statement or idea regarding a population.

For instance, a statement could be, “The mean return for the U.S. equity market is greater than zero.” Given the relevant returns data, hypothesis testing procedures can be employed to test the validity of this statement at a given significance level.

A hypothesis is a statement about the value of a population parameter developed for the purpose of testing a theory or belief.

38
Q

The null hypothesis and the alternative hypothesis

A

The null hypothesis, designated H0, is the hypothesis the researcher wants to reject. It is the hypothesis that is actually tested and is the basis for the selection of the test statistics. The null is generally a simple statement about a population parameter. Typical statements of the null hypothesis for the population mean include H0: μ=μ0, H0: μ≤μ0, and H0: μ≥μ0, where μ is the population mean and μ0 is the hypothesized value of the population mean.

The null hypothesis always includes the “equal to ” condition.

The alternative hypothesis, designated HA, is what is concluded if there is sufficient evidence to reject the null hypothesis. It is usually the alternative hypothesis the researcher is really trying to assess. Why? Since you can never really prove anything with statistics, when the null hypothesis is discredited, the implication is that the alternative hypothesis is valid.

39
Q

Test statistic

A

Hypothesis testing involves two statistics:

  • the test statistic calculated from the sample data and
  • the critical value of the test statistic.
40
Q

One-tailed and two-tailed tests of hypotheses

A

The alternative hypothesis can be one-sided or two-sided. A one-sided test is referred to as a one-tailed test, and a two-sided test is referred to as a two-tailed test. Whether the test is one- or two-sided depends on the proposition being tested. If a researcher wants to test whether the return on stock options is greater than zero, a one-tailed test should be used.

However, a two-tailed test should be used if the research question is whether the return on options is simply different from zero. Two-sided tests allow for deviation on both sides of the hypothesized value (zero). In practice, most hypothesis tests are constructed as two-tailed tests.

41
Q

Two-tailed test

A

A two-tailed test for the population mean may be structured as:

H0: μ=μ0 versus HA: μ ≠ μ0

Since the alternative hypothesis allows for values above and below the hypothesized parameter, a two-tailed test uses two critical values (or rejection points).

The general decision rule fo r a two-tailed test is:

Reject H0 if:

  • test statistic > upper critical value or
  • test statistic < lower critical value
42
Q

One-tailed test

A

For a one-tailed hypothesis test of the population mean, the null and alternative hypotheses are either:

  • Upper tail: H0: μ < μ0 versus HA: μ > μ0, or
  • Lower tail: H0: μ > μ0 versus HA: μ < μ0
43
Q

Type I and Type II Errors

A

Hypothesis testing is used to make inferences about the parameters of a given population on the basis of statistics computed for a sample that is drawn from that population. We must be aware that there is some probability that the sample, in some way, does not represent the population and any conclusion based on the sample about the population may be made in error.

When drawing inferences from a hypothesis test, there are two types of errors:

  • Type I error: the rejection of the null hypothesis when it is actually true.
  • Type II error: the failure to reject the null hypothesis when it is actually false.
44
Q

The Power of a Test

A

While the significance level of a test is the probability of rejecting the null hypothesis when it is true, the power of a test is the probability of correctly rejecting the null hypothesis when it is false. The power of a test is actually one minus the probability of making a Type II error, or 1 - P (Type II error). In other words, the probability of rejecting the null when it is false (power of the test) equals one minus the probability of not rejecting the null when it is false (Type II error). When more than one test statistic may be used, the power of the test for the competing test statistics may be useful in deciding which test statistic to use. Ordinarily, we wish to use the test statistic that provides the most powerful test among all possible tests.

Sample size and the choice of significance level (Type I error probability) will together determine the probability of a Type II error. The relation is not simple, however, and calculating the probability of a Type II error in practice is quite difficult. Decreasing the significance level (probability of a Type I error) from 5% to 1%, for example, will increase the probability of failing to reject a false null (Type II error) and, therefore, reduce the power of the test. Conversely, for a given sample size, we can increase the power of a test only with the cost that the probability of rejecting a true null (Type I error) increases. For a given significance level, we can decrease the probability of a Type II error and increase the power of a test, only by increasing the sample size.

45
Q

Statistical significance vs. economic significance.

A

Statistical significance does not necessarily imply economic significance.

For example, we may have tested a null hypothesis that a strategy of going long all the stocks that satisfy some criteria and shorting all the stocks that do not satisfy the criteria resulted in returns that were less than or equal to zero over a 20-year period. Several factors must be considered:

  • One important consideration is transactions costs.
  • Taxes are another factor that may make a seemingly attractive strategy a poor one in practice.
  • A third reason that statistically significant results may not be economically significant is risk. In the above strategy, we have additional risk from short sales (they may have to be closed out earlier than in the test strategy). Since the statistically significant results were for a period of 20 years, it may be the case that there is significant variation from year to year in the returns from the strategy, even though the mean strategy return is greater than zero.
46
Q

The p-value

A

The p-value is the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the null hypothesis is true. It is the smallest level of significance for which the null hypothesis can be rejected. For one-tailed tests, the p-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests. For two-tailed tests, the p-value is the probability that lies above the positive value of the computed test statistic plus the probability that lies below the negative value of the computed test statistic.

Many researchers report p-values without selecting a significance level and allow the reader to judge how strong the evidence for rejection is.

47
Q

The t-test

A

When hypothesis testing, the choice between using a critical value based on the t-distribution or the z-distribution depends on:

  • sample size,
  • the distribution of the population, and
  • whether the variance of the population is known.

Use the t-test if the population variance is unknown and either of the following conditions exist:

  • The sample is large (n > 30).
  • The sample is small (n < 30), but the distribution of the population is normal or approximately normal.

If the sample is small and the distribution is non-normal, we have no reliable statistical test.

In the real world, the underlying variance of the population is rarely known, so the t-test enjoys widespread application.

48
Q

The z-test

A

The z-test is the appropriate hypothesis test of the population mean when the population is normally distributed with known variance. The computed test statistic used with the z-test is referred to as the z-statistic. The z-statistic for a hypothesis test for a population mean is computed as follows:

z-statistic = (x̅ - μ0)/(σ/n0,5)

where:
x̅ = sample mean
μ0 = hypothesized population mean
σ = standard deviation of the population
n = sample size

49
Q

The chi-squared test

A

The chi-squared test is used for hypothesis tests concerning the variance of a normally distributed population. Letting σ2 represent the true population variance and σ02 represent the hypothesized variance, the hypotheses for a two-tailed test of a single population variance are structured as:

H0: σ2 = σ02 versus HA: σ2 ≠ σ02

The hypotheses for one-tailed tests are structured as:

  • H0: σ2 ≤ σ02 versus HA: σ2 > σ02 , or
  • H0: σ2 ≥ σ02 versus HA: σ2 < σ02
50
Q

The F-Test

A

The hypotheses concerned with the equality of the variances of two populations are tested with an F-distributed test statistic. Hypothesis testing using a test statistic that follows an F-distribution is referred to as the F-test. The F-test is used under the assumption that the populations from which samples are drawn are normally distributed and that the samples are independent.

If we let σ12 and σ22 represent the variances of normal Population 1 and Population 2, respectively, the hypotheses for the two-tailed F-test of differences in the variances can be structured as:

H0: σ12 = σ22 versus H<span>A</span>: σ12 ≠ σ22

and the one-sided test structures can be specified as:

H0: σ12 ≤ σ22 versus H<span>A</span>: σ12 > σ22 or

H0: σ12 ≥ σ22 versus H<span>0</span>: σ12 22

The test statistic for the F-test is the ratio of the sample variances. The F-statistic is computed as:

F=s12/s22

where:
s12 = variance of the sample of n1 observations drawn from Population 1
s22 = variance of the sample of n2 observations drawn from Population 2

51
Q

Chebyshev’s inequality

A

Chebyshev’s inequality states that for any set of observations, whether sample or population data and regardless of the shape of the distribution, the percentage of the observations that lie within k standard deviations of the mean is at least

1 -1 /k2 for all k > 1

52
Q

Backtesting

A

Backtesting is the process of comparing losses predicted by the value at risk (VaR) model to those actually experienced over the sample testing period. If a model were completely accurate, we would expect VaR to be exceeded with the same frequency predicted by the confidence level used in the VaR model. In other words, the probability of observing a loss amount greater than VaR should be equal to the level of significance.

When the VaR measure is exceeded during a given testing period, it is known as an exception or an exceedance.

One of the main issues with backtesting VaR models is that exceptions are often serially correlated. In other words, there is a high probability that an exception will occur after the previous period had an exception. Another issue is that the occurrence of exceptions tends to be correlated with overall market volatility. In other words, VaR exceptions tend to be higher (lower) when market volatility is high (low). This may be the result of a VaR model failing to quickly react to changes in risk levels.

53
Q

Standard error of the sample mean

A

Standard error of the sample mean = SQRT (sample variance / n) = sample standard deviation / SQRT(n)

54
Q

How to calculate standard error in hypothesis testing for the difference between two means?

A

SE = SQRT( variance1/N1 + variance2/N2)

55
Q

Sampling error is?

A

Sampling error is the (absolute) difference between the sample mean and the population mean.