Midterm Flashcards
What is Supervised Regression Learning?
“Supervised” - provide example X,Y
“Regression” - numerical prediction (as opposed to classifications)
“Learning” - train with data
Examples: Linear Regression (parametric learning), KNN (instance based - keep data and consult it), Decision Trees, Decision Forests
03-01
What is Backtesting?
Rollback time and test your system. Let system only look at a subset of data and then use that to predict the “future” aka data we have, but haven’t let our system look at.
03-01
What are some problems with regression?
- It’s Noisy and Uncertain
- It’s challenging to Estimate Confidence
- Holding time, allocation
Some issues can be addressed using Reinforcement Learning (learn policy, which tells us whether to buy/sell)
03-01
What is Linear Regression?
Parametric Learning
y = mx + b
03-02
What is K Nearest Neighbor?
Instance-based approached where we keep the data and use it when we make a query.
If k = 3, we find the 3 nearest historical data points and take mean of those 3 when querying. Some cons is that at the beginning/ends of the data, we’ll have horizontal lines due to there being no more data points.
Another similar method is called Kernel Regression.
03-02
What is Kernel Regression?
Kernel Regression weights the contributions of data points based on how close they are vs. KNN, where each gets an equal weight.
03-02
What is the difference between Parametric and Instance based learning?
Parametric uses the data to train on and then discards it.
Instance based keeps the data and uses it during query time.
03-02
When to use Parametric vs. Non-Parametric models?
When something is biased (we have a guess), we should use a parametric model.
If we don’t have a guess, a non-parametric model would probably fit our data better (we keep the data to query on).
03-02
KNN: As K increases, what happens to the data?
If K = size of data, it’s a straight line, since we use the total number of elements and just take the mean of that. If K = 1, we just find the nearest data point, which will match the data.
Therefore, as we decrease K, we are more likely to OVERFIT the data.
03-03
Parametric: As D (degree) increases, what happens to the data?
d = 1, y = mx = b
d = 2, y = m2x^2 +mx + b
etc.
Therefore, as D increases, we are more likely to OVERFIT the data.
03-03
Which would you expect to be larger? In Sample Error (training set) or Out of Sample Error (test set)?
Out of Sample
03-03
What are two ways to visualize or evaluate the accuracy of an algorithm?
RMSE and Correlation (Ytest vs. YPredict)
03-03
As RMSE error increase, correlation does what? (increases, decreases, not sure)
In most cases, as RMSE increases, correlation decreases, but there are some cases where the opposite happens so …
We can’t be sure.
03-03
What is overfitting?
Error on Y, Degrees of Freedom on X
As D increases, Y decreases (due to fitting the data more - until we reach the amount of data points).
As we increase D, in-sample error decreases, but out of sample error increases. This area is where overfitting occurs.
03-03
Linear Regression vs. KNN: which is better for saving space?
Linear Regression (don’t need to store all the data)
03-03
Linear Regression vs. KNN: which is better for train time?
KNN (no time to train)
03-03
Linear Regression vs. KNN: which is better for query time?
Linear Regression (plug in numbers)
03-03
Linear Regression vs. KNN: which is better for adding new data?
KNN (can add more data without recomputing factors)
03-03
Why do we use Ensemble Learners?
- Lower error than any individual method by itself
- Less overfitting (because each learner has its own bias)
03-04
What is Bootstrap Aggregating (bagging)?
Same learning algorithm, but train each learner on a different set of the data (sampling with replacement).
Developed by Bremen in late 80s.
03-04
What is Boosting (Ada(ptive) Boost)?
In sebsequent creation of bags, each data instances is weighted based on previous error (points that were not predicted well). Significant error points are more likely to get picked in the next creation of a bag.
03-04
Which is more likely to overfit as M (bags) increases? Simple Bagging or Ada Boost?
Ada Boost because as m increases, Ada Boost tries to assign more and more specific data points to subsequent learners, trying to model all the difficult exaples.
03-04
What are Exchange-Traded Funds?
- Buy/sell like stocks
- Baskets of stocks
- Transparent
- Liquid (easy to trade, lots of dollar value trading each day)
Basket of equities allocated in such a way that the overall portfolio tracks the performance of a stock exchange index
02-01
What are Mutual Funds?
- Buy/sell at end of day
- Quarterly disclosure (stated goals and you know what they’re trying to achieve)
- Less transparent
02-01
What are Hedge Funds?
- Buy/sell agreement (hard to exit)
- No disclosure
- Not transparent
02-01
What are Assets Under Management (AUM)?
Total amount of money being managed by the fund.
02-01
How are fund managers compensated for ETFs, Mutual Funds, and Hedge Funds?
ETFs - expense ratio of AUM (0.01% - 1%)
Mutual Funds - expense ratio of AUM (0.5% - 3%)
Hedge Funds - “Two and Twenty”
2% of AUM + 20% of profits
02-01
Two and Twenty with $100M w. 15% Return. What is your compensation?
“Two” - $100M * 0.02 = $2M
“Twenty” - $15M * 0.2 = $3M
Total = $5M compensation
02-01
What are the 3 major types of investors in Hedge Funds? And Why?
- Individuals (rich fucks, usually 100 of them)
- Institutions (large retirement funds, university foundations, non-profit institutions)
- Funds of funds
Why would they invest in you?
- track record (5 years)
- simulation + story (reason why method works)
- good portfolio fit (why your strat works with their portfolio)
02-01
What are the goals and metrics Hedge Funds go after?
Goals
- Beat a benchmark e.g. SP500, but benchmark you choose should depend on your expertise
- Absolute return (make slow gradual, positive returns no matter what - Long/Short)
Metrics
- Cumulative Return = ((last val / first val) - 1)
- Volatility = daily_return.std()
- Risk/Reward = Sharpe Ratio
02-01