Quantitative Methods Flashcards

1
Q

Numerical Data (e.g. Discrete, Continuous)

A

Values that can be counted.

We can perform mathematical operations only on numerical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical Data (e.g. Nominal, Ordinal)

A

consist of labels that can be used to classify a set of data into groups. Categorical data may be nominal or ordinal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete Data

A

Countable data , such as the months, days, or hours in a year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Continuous Data

A

Can take any fractional value (e.g., the annual percentage return on an investment).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nominal Data

A

Data that cannot be placed in a logical order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinal Data

A

Can be ranked in logical order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Structured Data

A

Data that can be organised in a defined way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Time series

A

A set of observations taken periodically e.g. at equal intervals over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cross-sectional data

A

Refers to a set of comparable observations all taken at one specific point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Panel Data

A

Time series and cross-sectional data combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unstructured Data

A

A mix of data with no defined structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

One-dimensional array

A

represents a single variable (e.g. a time series)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two-dimensional array

A

Represents two variables (e.g. panel data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Contingency table

A

A two-dimensional array that displays the joint frequencies of two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Confusion matrix

A

A contingency table (two variables) that displays predicted and actual occurrences of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Relationship between geometric and arithmetic mean

A

The geometric mean is always less than or equal to the arithmetic mean, and the difference increases as the dispersion of the observations increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Trimmed mean

A

Estimate the mean without the effects of a given percentage of outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Winsorized mean

A

Decrease the effect of outliers on the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Harmonic mean

A

Calculate the average share cost from periodic purchases in a fixed dollar amount.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Empirical probability

A

established by analysing past data (outcomes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Priori probability

A

determined using reasoning and inspection process (not data) e.g. looking at a coin and deciding there is a 50/50 chance of each outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Subjective probability

A

Established using personal judgement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Unconditional probability (marginal probability)

A

the probability of an event regardless of the past or future occurrence of other events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Conditional probability

A

where the occurrence of one event affects the probability of the occurrence of another event. e.g. Prob (A I B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Multiplication rule of probability
P (AB) = P (A I B) * P (B)
26
Addition rule of probability
P (A or B) = P (A) + P (B) - P (AB)
27
Probability distribution
The probabilities of all the possible outcomes for a random variable
28
A discrete random variable
when the number of possible outcomes in a probability can be counted and there is a measurable/positive probability. e.g. the number of days it may rain in a month.
29
A continuous random variable
When the number of possible outcomes is infinite, even if upper and lower bands exist. e.g. the amount of rainfall per month.
30
The probability function , p(x)
gives the probability that a discrete random variable will equal X
31
A cumulative probability function (cdf) , F(x)
gives the probability that a random variable will be less than or equal to a given value.
32
Binomial Random Variable - E(X) = np
Binomial Random Variable - Var(X) = np(1-p)
33
For a continuous random variable X, the probability of any single value of X is
0
34
The normal distribution has the following key properties:
- It is completely described by its mean, μ, and variance, σ2, stated as X ~ N(μ, σ2). In words, this says that "X is normally distributed with mean μ and variance σ2." - Skewness = 0 (symmetrical), meaning that the normal distribution is symmetric about its mean, so that P(X ≤ μ) = P(μ ≤ X) = 0.5, and mean = median = mode. - Kurtosis = 3; this is a measure of how flat the distribution is. Recall that excess kurtosis is measured relative to 3, the kurtosis of the normal distribution. - A linear combination of normally distributed random variables is also normally distributed. - The probabilities of outcomes further above and below the mean get smaller and smaller but do not go to zero (the tails get very thin but extend infinitely).
35
univariate distribution
 the distribution of a single random variable
36
A multivariate distribution
the distribution of two or more random variables (takes into account correlation coefficients) - specifies the probabilities associated with a group of random variables and is meaningful only when the behavior of each random variable in the group is in some way dependent on the behavior of the others.
37
Number of correlations in a portfolio
0.5n*(n-1) n = no. of assets in portfolio / variables
38
Normal distribution: +/-1 s.d. from the mean
68% confidence interval
39
Normal distribution: +/-1.65 s.d. from the mean
90% confidence interval
40
Normal distribution: +/-1.96 s.d. from the mean
95% confidence interval
41
Normal distribution: +/-2.58 s.d. from the mean
99% confidence interval
42
"standardizing a random variable" (finding z)
measuring how far it lies from the arithmetic mean | z = the no. of standard deviations the variable is from the mean
43
How to calculate z | how many standard deviations a variable is from the mean
z = ( x - pop. mean ) / s.d.
44
shortfall risk
probability that a portfolio return or value will be below a target return or value
45
Roy's safety first ratio (SF ratio)
no. of standard deviations the target return is from the expected return/value The larger the SF ratio, the lower the probability of falling below the minimum threshold.
46
For a standard normal distribution, F(0) is:
0.5 By the symmetry of the z-distribution and F(0) = 0.5. Half the distribution lies on each side of the mean. (LOS 4.j)
47
Holding period of return --> Continuously compounded rate
ln ( 1 + holding period of return) ln = natural log
48
Continuously compounded rate --> Holding period of return
e^ continuously compounded rate -1
49
t-distribution
- Symmetrical. - Defined by degrees of freedom (df), where the degrees of freedom = no. of sample observations - 1 (n – 1), for sample means. - More probability in the tails ("fatter tails") than the normal distribution. - As the degrees of freedom (the sample size) gets larger, the shape of the t-distribution more closely approaches a standard normal distribution. NOT PLATYKURTIC - is less peaked than normal dist. but has fatter tails. For t-distribution, the lower the degrees of freedom, the fatter the tails and the greater the probability of extreme outcomes.
50
chi-square distribution (x^2) used to test variance of a normally distributed population
- Distribution of the sum of squared values of n independent standard normal random variables (all positive values) - Asymmetric - Degrees of freedom = n-1 - As degrees of freedom increases, approaches normal distribution
51
Degrees of freedom, k (in context of distribution charts)
the number of values a random variable can vary from the mean
52
F-distribution used to test variance of two population variances
- Quotient of two chi-square distributions with m and n degrees of freedom (all positive values) - Asymmetric - As degrees of freedom increase, approaches normal distribution
53
F-stat formula | F-distribution
F-stat = ( x^2 / m ) / ( x^2 / n ) = (chi-square for sample 1 / m ) / (chi-square for sample 1 / n )
54
Monte Carlo Simulation used to estimate a distribution of asset prices
Generating 1000s of simulations of the asset using its variables, then calculate the mean/variance of the outcomes and price the asset accordingly
55
Use of Monte Carlo Simulation
- Value complex securities. - Simulate the profits/losses from a trading strategy. - Calculate estimates of value at risk (VaR) to determine the riskiness of a portfolio of assets and liabilities. - Simulate pension fund assets and liabilities over time to examine the variability of the difference between the two. - Value portfolios of assets that have abnormal returns distributions.
56
Binomial random variable
When there are only two possible outcomes of a given event.
57
Sampling error
The difference between a sample statistic and its corresponding population parameter sampling error of the mean = sample mean – population mean = x – µ
58
The standard error of the sample mean (when population is known)
standard deviation of the distribution of the sample means | σx = σp / √n
59
Effect on the standard deviation of the sample, if the sample (n) increases?
n ↑ σs ↓
60
Desirable characteristics of an estimator (sample statistic)
Unbiased Efficient Consistent
61
Simple random sampling
Selecting a sample where each item in the population is has the same probability of being chosen
62
Stratified random sampling
randomly selecting samples proportionally from sub-groups. Sub-groups are formed based on one or more defining characteristics
63
Cluster sampling
similar to strat. random sampling - but subgroups are not necessarily based on the data. 1 stage: sample is chosen from random subgroups (clusters) 2 stage: sample is chosen from each subgroup (cluster)
64
Central limit theorem
For a population with a mean (µ) and a variance (σ^2) - ta sample distribution of 30+ will reflect the distribution of the population
65
Confidence interval
A range of values in which the population mean is expected to lie within a given probability
66
Reliability factor for 90% confidence interval
1.645
67
Reliability factor for 95% confidence interval
1.96
68
Reliability factor for 99% confidence interval
2.575
69
Confidence interval for a single item selected from the population
population mean (µ) +/- reliability factor * σ
70
Confidence interval for a point estimate (values used to estimate population parameters) selected from a sample
Mean of sample +/ reliability factor * standard error
71
Confidence interval for a sample mean
population mean (µ) +/- reliability factor * standard erro
72
Which test statistic should be used for a normal distribution with a known variance?
z-statistic
73
Which test statistic should be used for a normal distribution with an unknown variance?
t-statistic
74
Which test statistic should be used for a non-normal distribution with a known variance?
z-statistic | NB not available with a small sample n<30
75
Which test statistic should be used for a non-normal distribution with an unknown variance?
t-statistic | NB not available with a small sample n<30
76
Jackknife method of estimating standard error of the sample mean
Calculate the s.d. of multiple sample means (each sample with one observation removed from the sample). - Computationally simple - Used when population is small - Removes bias from statistical estimates
77
Bootstrap method of estimating standard error of the sample mean
Calculate the s.d. of multiple sample means (each sample possible).
78
Two issues of the idea that larger samples increase accuracy of understanding the population
- May contain wrong observations (from other populations) | - Additional cost
79
Data snooping
Using a sample of observations to form an opinion - leads to 'data snooping bias'
80
Sample selection bias
When certain observations are systematically excluded from the analysis (usually due to lack of available data)
81
Survivorship bias
Only including active/live data. e.g. only including active funds in an analysis of fund performance.
82
Time-period bias
Using data within a time period that is either too long or too short
83
Look-ahead bias
When a study tests a relationship with data that was not available on the test date.
84
Stratified random sampling is most often used to preserve the distribution of risk factors when creating a portfolio to track an index of:
Corporate bonds | risk factors e.g. 'stratas' can be more easily identified - which forms the basis of the sample
85
If random variable Y follows a lognormal distribution then the natural log of Y must be:
normally distributed.
86
Steps involved in hypothesis testing:
- Hypothesis - Test statistic - Level of significance - Decision rule for hypothesis - Collect sample and calculate statistics - Make decision on hypothesis - Make decision on test results
87
Null hypothesis (Ho)
- Always includes '=' sign - Two tailed test - The test the researcher wants to reject
88
Alternative hypothesis (Ha)
- What is concluded if null hypothesis is wrong
89
General decision rule for a two-tailed test:
Reject Ho (null hypothesis) if: test statistic > upper critical value, or test statistic < lower critical value (in one of the outer tails)
90
test statistic equation
(sample statistic - hypothesized value) / SE of sample statistic
91
Type I error
Rejecting the null hypothesis when it is true
92
Type II error
Failing to reject null hypothesis when it is false determined by sample size and choice of significance level
93
Probability of making a Type I error | wrongly rejecting null hypothesis
The significance level (α)
94
Probability of correctly rejecting null hypothesis?
The power of the test 1 - the prob. of making a type 2 error
95
What is the decision rule for rejecting or failing to reject the null hypothesis based on?
the distribution of the test statistic
96
Statistical significance
refers to the use of a sample to carry out a statistical test meant to reveal any significant deviation from the stated null hypothesis.
97
Economic significance
the degree to which is the statistical significance is economically viable
98
p-value
Probability of obtaining a test statistic that would lead to a rejection of the null hypothesis (assuming the null hypothesis is true) The smallest level of significance where the null can be rejected
99
When is it appropriate to use a z-test as the appropriate hypothesis test of the population mean?
Normal distribution and known variance
100
When is it appropriate to use a t-test as the appropriate hypothesis test of the population mean?
Unknown variance
101
Critical z-values for 10% level of significance
Two-tailed test: +/-1.65 | One-tailed test: +1.28 or -1.28
102
Critical z-values for 5% level of significance
Two-tailed test: +/-1.96 | One-tailed test: +1.65 or -1.65
103
Critical z-values for 1% level of significance
Two-tailed test: +/-2.58 | One-tailed test: +2.33 or -2.33
104
Difference in means test
Two populations that are independent and normally distributed
105
Paired comparisons test
Two populations that are dependent of each other and normally distributed
106
How to test for the variance of a normally distributed population
The chi-squared test
107
How to test whether the variances of two normal populations are equal
The F -test
108
Parametric tests
based on assumptions about population distribution and parameters (e.g. mean = 3, variance = 100)
109
Non-parametric tests
based on minimal/no assumptions of population and test things other than parameter values (e.g. rank correlation tests, runs tests,)
110
How to test whether two characteristics in a sample of data are independent of each other?
The X^2 test
111
The appropriate test statistic for a test of the equality of variances for two normally distributed random variables, based on two independent random samples, is:
the F-test.
112
The appropriate test statistic to test the hypothesis that the variance of a normally distributed population is equal to 13 is:
the χ2 test. A test of the population variance is a chi-square test.
113
The test statistic for a Spearman rank correlation test for a sample size greater than 30 follows:
a t-distribution. The test statistic for the Spearman rank correlation test follows a t-distribution.
114
Assumptions of Linear Regression
- Linear relationship between the dependent and independent variables - Variance of the residual term is constant (homoskedasticity) - Residual terms independently and normally distributed
115
Coefficient of Variation (R^2)
= SSR / SST measures the percentage of total variation in Y variable explained by the variation in X For simple regression R^2 = correlation^2 XY
116
Factorial function
The factorial function, denoted n!, tells how many different ways n items can be arranged where all the items are included.
117
Coefficient of Variation
σ/µ
118
For a test of the equality of two variances
F-statistic.
119
unbiased estimator
the expected value equals the parameter it is intended to estimate.
120
A consistent estimator
the probability of estimates close to the value of the population parameter increases as sample size increases.