Midterm Flashcards
What are CSV files?
CSV - Comma Separated Values
Header Row, separated by commas
Data Rows, separated by commas
Which field would you expect to see in a CSV file of stock data?
- # of employees
- Date/Time
- Company Name
- Price of the stock
- Company’s hometown
- Date/time
- Price of the stock
What does real stock data look like?
Header: Date, Open, High, Low, Close, Volume, Adjusted Close
Close - closing price reported at exchange
Volume - volume sold
Adjusted Close - Number data provider provides based on stock splits and dividend payments. The rate of return looking back with adjusted close should be larger.
What is a data frame?
Columns represent the stock symbols ( Separate dataframes can have different dimensions of data AdjClose, Volume, Close, etc..)
Rows represent time
What pandas code would allow you to print the first or last 5 rows of the DataFrame df?
df = pd.read_csv(“data/AAPL.csv”)
First 5 rows:
print df.head()
Last 5 rows:
print df.tail()
Last n rows:
print df.tail(n)
How do you review specific rows in a data frame betwen random values? For example between rows 10 to 20?
df = pd.read_csv(“data/AAPL.csv”)
print df[10:21]
Note that the second number is not inclusive in the range
How do you compute the max closing price for a stock using pandas?
df = pd.read_csv(“data/{}.csv”.format(symbol))
max value = df [‘Close’].max()
How do you compute the mean volume for a symbol?
df = pd.read_csv(“data/{}.csv”.format(symbol))
Mean = df[‘Volume’].mean()
How would you plot the adjusted close of the following data?
df = pd.read_csv(“data/AAPL.csv”)
print df [‘Adj Close’]
df [‘Adj Close’].plot()
plt.show()
Select the ‘High’ column from the dataframe and then plot it.
df = pd.read_csv(“data/XXX.csv”)
print df [‘High’]
df [‘High’].plot()
plt.show()
How do you plot two columns, such as ‘Close’ and ‘Adj Close’
df = pd.read_csv(“data/AAPL.csv”)
df [[ ‘Close’, ‘Adj Close’] ]. plot()
How many days were US stocks traded at NYSE in 2014?
252
What is S&P 500 and what is SPY?
S&P 500 - Stock Market Index based on 500 large American companies listed on the NYSE or NASDAQ. Essentially a weighted mean of the stock prices of the companies
SPY - SPDR S&P 500 - An ETF (Exchange-Traded Fund) that tracks the S&P 500 index
How do you create an empty data frame (df1) with a given datetime range?
start_date = ‘2010-01-22’
end_date = ‘2010-01-26’
dates = pd.date_range(start_date, end_date)
df1 = pd.DataFrame( index = dates)
Using an empty data frame (df1) in a specified daterange, how do you join df1 to a data frame for SPY (dfSPY)
df1 = pd.DataFrame( index = dates )
Ensure first that the SPY dataframe is indexed with the date column, not the numbered column. Additionally, ensure that na values are interpreted is a “not a number” and not as strings
dfSPY = pd.read_csv( “data/SPY.csv”, index_col = “Date”, parse_dates = True, na_values=[‘nan’] )
Join the two dataframes using DataFrame.join()
df1 = df1.join(dfSPY)
How do you drop NaN values on a data frame (df1)?
df1 = df1.dropna()
How do you drop NaN values when combining two dataframes (ie. df1 and dfSPY)?
df1.join ( dfSPY, how = ‘inner’ )
What is the default operation for the “how” parameter in the dataframe.join function?
The default option is left which indicates that the calling dataframe’s index will be used. Therefore, any dates from the calling dataframe will be preserved, potentially yielding NaN values if not shared by the other dataframe.
How can you read in multiple stocks into one dataframe though they may contain the same column names?
symbols = [‘GOOG’, ‘IBM’, ‘GLD’]
for symbol in symbols:
df_temp = pd.read_csv(“data/{}.csv”.format(symbol), index_col = ‘Date’, parse_dates = True, usecols=[‘Date’, ‘Adj Close’], na_values = [‘nan’])
Rename columns
df_temp = df_temp_rename( columns = {‘ Adj Close’ : symbol})
df1 = df1.join(df_temp)
In a dataframe (df) containing multiple symbols,
how would you drop dates in which SPY did not trade?
if symbol == ‘SPY’:
df = df.dropna( subset = [SPY])
How do you select the piece of data indicating 2010 - 02 - 13 to 2010 - 02 - 15 and only GOOG and GLD?
df = df.ix [‘2010-02-13’ : ‘2010-02-15’, [ ‘GOOG’, ‘GLD’] ]
What is the best way to normalize price data so that all prices start at 1.0?
df1 = df1 / df1[0]
OR
df1 = df1 / df.ix[0]
Slice and plot SPY and IBM over the daterange ‘2010-03-01’ to ‘2010-04-01’
start_index = ‘2010-03-01’
end_index = ‘2010-04-01’
columns = [‘SPY’, ‘IBM’]
plot_data(df.ix [start_index: end_index, columns], title=”title”)
df. plot()
plt. show()
…
def plot_data(df, title=”title”):
ax = df.plot(title = title, fontsize = 2)
ax. set_xlabel(“Date”)
ax. set_ylabel(“Price”)
plt. show()
How do you normalize a dataframe df?
df = df / df.ix [0 :]
How do you return the number of rows in an array, a?
a.shape[0]
How do you return the number of columns in an array, a?
a.shape[1]
How do you get the number of items in an array, a?
a.size
How do you get the sum of elements of an array, a?
a.sum()
How do you get the sum of each column of an array, a?
a.sum(axis = 0)
How do you get the sum of each row of an array, a?
a.sum(axis =1)
How do you get the location of the maximum value of an array, a?
a.argmax()
In an array a, how would you get the entire row of every other column up to the 3rd column?
a[:, 0:3:2]
where 0 indicates start at first column
3 indicates end before 3rd column
2 indicates choose every second element
How do you index an array, a, with another array, b?
a = np.random(10, size = 5)
indices = np.array( [1, 1, 2, 3]
a = [7, 6, 8, 5, 9]
a [indices] = [6, 6, 8, 5]
How would I access all elements in this array >5 ?
a = np.array( [1, 6, 5, 3, 8] )
a [a > 5]
How do you compute the daily returns of a dataframe, df?
The daily returns are the net earnings compared to the previous day.
daily_returns = df.copy
daily_returns[1 :] = ( df [1 :] / df [:-1].values ) - 1
daily_returns.ix[0, :] = 0
What is a bollinger band?
A way of quantifying how far a stock price has deviated from some norm.
Where are the bollinger bands?
2 standard deviations above and below the mean of the dataset. When the data crosses below the lower band, this could indicate a buy single. When the data crossed above the upper band, this could indicate a sell signal.
How do you calculate bollinger bands?
upper band = rolling mean + 2 * rolling std
lower band = rolling mean - 2 * rolling std.
What is an ETF?
An ETF or Exchange-Traded Fund is a basket of equities allocated in such a way that the overall portfolio tracks the performance of a stock exchange index. ETFs can be bought and sold on the market like shares.
How do you fill in missing data in a dataframe?
df. fillna( method = ‘‘ffill’)
df. fillna( method = ‘bfill’)
Fill forward first then fill backwards.
What is kurtosis?
It tells us about the tails of the distribution.
It tells us how different our distribution is from the Gaussian distribution.
A positive kurtosis is more occurance in the tales than expected.
How would you print a scatter ploy of ‘SPY’ and ‘GLD’ data?
df.plot( kind = ‘scatter’ , x = ‘SPY’ , y = ‘GLD’ )
How do we fit a polynomial of degree 1 to a graph?
beta, alpha = np.polyfit( dailyret[‘SPY’], dailyret[‘XOM’], 1)
plt. plot( dailyret[‘SPY’] , beta * dailyret[‘SPY’] + alpha )
plt. show()
How do you find the correlation on a dataframe?
df.corr ( method = pearson )
How do you calculate the daily portfolio value
1) normalize df (prices / prices[0]
2) determine allocations = normed * allocs
3) determine position values = allocs * start_val
4) determine portfolio values = pos_vals.sum(axis = 1)
What is the sharp ratio?
Risk adjusted return
All else being equal:
lower risk is better
higher return is better
SR also considers risk free rate of return
What is the formula for Sharp Ratio?
( Rp - Rf ) / StdDev
Rp - portfolio return
rf - risk free rate of return
stddev - std dev of portfolio return
ExpectedVal [Rp - Rf] / Std [Rp - Rf]
Mean [daily_rets - daily_rf] / std [daily_rets - daily_rf]
Using the shortcut and treating daily_rf as a constant:
mean [daily_rets - daily_rf] / std [daily_rets]
How do you compute the annual risk free rate into a daily amount?
Daily_Rf = 252nd sq rt ( begining value + risk free rate ) - 1
What do you do if the SR varies?
Consider SR an annual measure
Sr annualized = K * SR
K = sqrt ( #samples per year )
SR = sq rt (252) * mean ( daily_rets - daily_rf ) / std ( daily_rets)
Ranges - limits on X
Constraints - properties that must be true
How do you limit an optimizer to useful data?
What is an optimizer?
- Find minimum values of functions
- Build parameterized models based on data
- Refine allocations to stocks in portfolios
How do you use an optimizer?
1) Provide a function to minimize
2) Provide an initial guess
3) Call the optimizer
What is the python library to optimize a function?
scipy.optimize
min_result = spo.minimize(func, Xguess, method=”SLSQP”, options = {‘disp’: True})
What is a convex function?
A real-valued function f(x) defined on an interval is called convex if the line segment between any two points on the graph of the function lies above the graph.
How do you build a parameterized model?
Figure out what you are minimizing.
Minimize the error
What are the types of funds
ETFs - Buy/sell like stocks, baskets of stocks, transparent
Mutual Fund - Buy/sell at end of day, quarterly disclosure, less transparent
Hedgefund - buy/sell by agreement, no disclosure, not transparent
What is liquid?
Ease with which one can buy shares in a holding
ETFs are liquid
What is large cap?
How is the company worth in terms of #shares x price
Price of the stock is related to what a share is selling at.
How can you tell what type a fund is?
ETFs - 3/4 letters
Mutual Funds - 5 letter
Hedge Funds - name
How are the manager of these funds compensated?
ETF
Mutual Funds
Hedge Funds
ETFs - Expense Ratio in terms of AUM (0.01 to 1%), tied to an index
Mutual Funds - Expense Ratio (0.5 to 3%)
Hedge Funds - Two and Twenty (2% of AUM and 20% of profits)
AUM - Assets Under Management is the total amount of money being managed by the fund.
What types of investors use hedge funds?
Individuals
Institutions - retirement funds, university foundations
Funds of funds - group together funds of individuals or institutions
What are hedge fun goals and metrics?
1) Beat a benchmark
2) Absolute returns
What is an order?
Buy or sell info
Symbol
shares
Limit or Market ( market means accept a good market price, limit price means no worse than a certain price)
Price
How do orders get to the exchange?
you -> broker -> exchange
you -> broker then joe -> broker then joe -> you
you -> broker -> dark pool <- broker2 <- lisa
What are broker order types?
Stop loss
Stop gain
Trailing stop
Selling Short
What is the value of a future dollar
PV = FV / (1 + IR) ^i
PV - present value
FV - future value
What is the difference between the interest rate and discount rate?
Interest rate is used with a given present value, to figure out what the future value would be
Discount rate is used when we have a known or desired Future Value and want to compute the corresponding present value.
What is the intrinsic value of a company?
FV / DR
Future Value / Discount Rate
What’s the value?
Dividend = d
Discount RAte = dr
d / dr
What is book value?
Total assets minus intangible assets and liabilities
What is market capitalization?
of shares * price
What is a portfolio?
A weighted set of assets.
Wi is the portion of funds in asset i
the sum of absolute value of the weights is 1.0
What is the equation for the return on a portfolio?
The weight * the return summed for all assets
What is the market porfolio?
An index that covers a large portion of stocks
US: SP500.
An index can be thought of as the “ocean” when malaise occurs
Index are cap weighted, where the weight of the stock is the market cap / sum of all market caps.
What is the CAPM equation?
The return for a stock on day t is equal to Beta times the return on the market on day t plus alpha on that day.
Ri (t) = Bi * Rm(t) + Ai (t)
Beta component - market, SLOPE!
Alpha component - residual, y INT!
CAPM says that alpha is expected to be 0.
What is CAPM vs Active Management?
Passive - buy index and hold
Active - pick stocks (over/under weight stocks)
What is the difference between CAPM and Active investors?
CAPM says that alpha is random and Expected (alpha) = 0
Active managers believe they can predict alpha, at least more than a coin flip.
What are the implications of cAPM?
Only way to beat market is choose Beta
Expected value of alpha = 0
Efficient Markets Hypothesis says you cant predict the market.
What is Arbitrage Pricing Theory (APT)?
We ought to consider multiple betas.
Beta for different sectors
Why do stocks split?
The price is too high
What are the problems with regression based forecasting?
1) Noisy and uncertain forecasts
2) Challenging to estimate confidence
3) Holding time, allocation
Pros and cons of parametric vs non-parametric learners
Parametric:
slow training
query fast
Non parametric:
traning fast
query slow
complex patterns with no underlying model
What is cross validation?
Splitting data into many chunks to create different test/train data.
It does not work well with financial data because it is time sensitive.
What is overfitting?
When in-sample error is decreasing and out-of-sample error is increasing