stats Flashcards
Quantitative data what ?
Numerical values :
what is continuous data?
Continuous data: It represents variables that cannot be counted but can be measured.
Discrete data: It can take up only integer values
The set of all possible outcomes is called
The sample space.
When we repeat a random experiment several times, we call each one of them a.
Trial
Any subset of the sample space is called an
Event
We can either get an even number, or an odd
number, but not both. Such events are called ?
Mutually exclusive
Bayesian statistics, a posterior probability
The posterior probability is calculated by updating the prior probability using Bayes’ theorem.
Example of mutually-exclusive event
Both football teams can’t win mutually-exclusive events
Probability density function : gives the probability that a discrete random variable X is equal to a certain value
Example of discrete variable.
Countable outcome, kids in a class
Continuous random variable
Continuous random variable can take an infinite number of outcomes, eg height
Area Under The Curve:
Which represents the total probability, in the case of
continuous variables
The y-axis in a probability density function represents
Is the probability.
What is Exponential distribution ?
A continuous distribution, that is often used to model the expected time one needs to wait
A left-skewed distribution or negatively skewed: has a long tail, in which direction ?
Left tail, the mean is less than the mode.
A right-skewed distribution or positively skewed distribution, has a long tail in
The right direction
Higher kurtosis implies ?
fatter tails, more probability for extreme values happening > 3 leptokurtic more risky.
Mesokurtic Distribution
Normal distribution
Leptokurtic Distribution
Thin & tall with fatter tails, higher to lep over, more risky
Platykurtic Distribution
Fat and wide shallow: like a plate
log-normal distribution
Income of people
T-distribution is measured in
Degrees of freedom
What is exponential distribution used to measure ?
Probability distribution, of time between events
Standard normal distribution is when ?
The mean is close to zero and standard deviation is 1.
Which distributions is often used to model the asset prices? As they are not negative
A lognormal distribution i
T -distribution has fat or thin tails ?
T has fatter tails then normal distribution
What is Inferential statistics
Extrapolating data to help predictions
CLT
Central limit theorem, helps us predict confidence intervals
The population
Is a superset of a sample and a representative sample of a larger group
The sample mean is a random variable as it varies from sample to sample.
Yes
Hypothesis is held true
Until we have evidence to reject
The p-value
The probability of observing a more extreme value than that of the test statistic, proves the null hypothesis is true
Type I error
Rejection of an actually true null hypothesis
Type II error
The failure to reject a null hypothesis that is actually false
A statistical hypothesis is a factual statement that
That is about a population parameter which may or may not be true.
The significance level or p-value is found first
Yes, setting significance value in advance, helps us to avoid bias
If p-value is less than the chosen significance level then we ?
We reject the null hypothesis i.e. The sample gives reasonable evidence, to support the alternative hypothesis.
If the obtained p-value is greater than the chosen significance level then we ?
We do not reject the null hypothesis.
Two or more variables, uses what type of statistical analysis ?
Covariance and correlation
Covariance positive, negative and and 0
Positive variables moving the same direction. Negative moving in opposite directions. Two variables are not related.
Pearson’s correlation analysis
is used to established negative positive correlation between -1 and 1
Correlation and causation
Correlation and causation may not be related
Correlation is a ?
Standardized version of covariance. The value of the Pearson’s correlation coefficient lies between +1 & -1
The variance of a random variable is ?
The variance of a random variable is nothing but the covariance of that variable with itself.
The predicted variable is known as ?
The dependent variable, this is dependent on the independent variable
A regression line is estimated using a method called
Ordinary least squares (OLS),
𝐑𝟐 (Coefficient of Determination):
Higher the value of R2, higher the accuracy of the model
Value of F statistics
Higher the value of F statistic, better the model.
Multicollinearity is good or bad ?
Bad, Variance Inflation Factor (VIF) is used to check.
What do you want in your data Heteroskedasticity: or homoskedasticity.?
homoskedasticity : Cook-Weisberg test
Normality of errors, what is the test?
Kolmogorov-Smirnov test or Shapiro-Wilk test
Error terms or residuals should or should not be correlated ? What is the test ?
Residuals should not be correlated. Durbin-Watson statistic (DW) = 2
Are there multiple independent variables in a linear regression model ?
No, a linear regression model has an independent variable and a dependent variable
R-squared value goes up or down with more variables ?
When more variables are added to the regression model, the R-squared value typically increases. It can never decrease on adding a variable.
Multicollinearity is a desired condition for building a regression model.
No
values y = mx + c, Which is Beta and alpha
Beta = M an alpha = C
How do we work out if we reject a Null hypothesis
If p value is less than ( 100% - confidence Interval )
Alpha is it good or bad
A big alpha reading is good.
What is Bayes theorem ?
Bayes’ theorem named after Thomas Bayes, describes the probability of an event Pa, based on prior knowledge of conditions ..Pb
bayes = (Pa U Pb) / Pb
Expected Value
EX = SUM (all values * prob )
Coverience
How stocks move together, if they move in line, coverience would be high.
Correlation is ?
-1 < corr < 1 : if negative the stocks are always moving in different directions.
How to find the STD against time of stock
= standard D * SQRT(T)
Is geometric return the same as compounded.
Yes
Hit rasio
Positive trades / all trades
Normalized Hit ratio above 65%
Profitable trade * % win / total ( winning and losing trades)
Kelly fraction
This is used to work out best % of wealth to invest
What is a good Sharp
AVE return / STD > 2 is good
Draw down
The max return - the lowest consecutive point
Hite ratio
Number of wins / the sum of all trades
For example, if you have 51 wins and 3 losses
Divide 51 by 54. A hit ratio of 94.4%
Normalized hit ratio
Number of wins * % Av win / all trades tatal (wins * %) + (losses * %)
p-values for all the four coefficients are almost 0,
Statistically significant, at a very high level of confidence.
ARCH
Autoregressive Conditional Heteroskedasticity method provides a way to model a change in variance in a time series that is time dependent, such as increasing or decreasing volatility
Homoscedasticity, heteroscedasticity
heteroscedasticity data that has seasonality volatility, Homo much more stationary
PACF
Partial correlation of a stationary time series with its own lagged values,
MA or AR , which shows surprises, sudden p
The MA models trys to capture the idiosyncratic shocks observed in financial markets.
Check for the normality of the residuals
Jarque-Bera
In the ADF test, if the p-value is greater than the level of significance, we conclude that:
The series is non-stationary
If a time series process is non-linear, which of the following type of model would likely describe it better?
Multiplicative
Seasonality component
The ‘seasonality’ component of a time series model does not try to capture the average value of the process. It is to show similarities or repeating pattons over time,
Which of the following is categorized as the non-systematic component of a time series model?
Noise
Autocorrelation function (ACF):
Autocorrelation function (ACF): Measures the correlation of a variable with a lagged version of itself. This is also called serial correlation
Which of the following statistical properties should remain constant in time for a time series to be stationary?
Mean
Variance
Covariance
Implied volatility
Is that’s of looking forward, estimation, computed based on supply and demand
Which of the following is a covered call strategy?
Buy a stock and sell an ATM call.
A protective put strategy
Built by going long on a stock and simultaneously buying a put option.
What is it when trader writes a put option
at strike price of INR 800 and receives a premium of INR 30. What is his profit or loss at expiry when the stock is trading at INR 840?
Buys a put option and receives the premium of INR 30 profit, as stock is now at 840
Cointegrated V correlated
Cointegrated when 2 stationary time series overlap, the don’t have to be correlated ,correlated they have trend direction similar .
ADF
is used to check cointegration
Lambda < 0
Lambda is less than 0 we reject the null hypothesis and state that the assets are stationary
Does a negative gradient line of stock price indicate what ?
A negative incline graph shows stationarity
What is covariance
The relationship of two variables, when postative both move in the same direction
What is the difference between covariance and correlation
Covariance how the two variables differ, correlation shows they are related & strength of correlation.