Midterm Flashcards

Question 1

Q

What are CSV files?

Answer

A

CSV - Comma Separated Values

Header Row, separated by commas

Data Rows, separated by commas

Question 2

Q

Which field would you expect to see in a CSV file of stock data?

# of employees
Date/Time
Company Name
Price of the stock
Company’s hometown

Answer

A

Date/time
Price of the stock

Question 3

Q

What does real stock data look like?

Answer

A

Header: Date, Open, High, Low, Close, Volume, Adjusted Close

Close - closing price reported at exchange

Volume - volume sold

Adjusted Close - Number data provider provides based on stock splits and dividend payments. The rate of return looking back with adjusted close should be larger.

Question 4

Q

What is a data frame?

Answer

A

Columns represent the stock symbols ( Separate dataframes can have different dimensions of data AdjClose, Volume, Close, etc..)

Rows represent time

Question 5

Q

What pandas code would allow you to print the first or last 5 rows of the DataFrame df?

df = pd.read_csv(“data/AAPL.csv”)

Answer

A

First 5 rows:

print df.head()

Last 5 rows:

print df.tail()

Last n rows:

print df.tail(n)

Question 6

Q

How do you review specific rows in a data frame betwen random values? For example between rows 10 to 20?

df = pd.read_csv(“data/AAPL.csv”)

Answer

A

print df[10:21]

Note that the second number is not inclusive in the range

Question 7

Q

How do you compute the max closing price for a stock using pandas?

df = pd.read_csv(“data/{}.csv”.format(symbol))

Answer

A

max value = df [‘Close’].max()

Question 8

Q

How do you compute the mean volume for a symbol?

df = pd.read_csv(“data/{}.csv”.format(symbol))

Answer

A

Mean = df[‘Volume’].mean()

Question 9

Q

How would you plot the adjusted close of the following data?

df = pd.read_csv(“data/AAPL.csv”)

print df [‘Adj Close’]

Answer

A

df [‘Adj Close’].plot()

plt.show()

Question 10

Q

Select the ‘High’ column from the dataframe and then plot it.

df = pd.read_csv(“data/XXX.csv”)

Answer

A

print df [‘High’]

df [‘High’].plot()

plt.show()

Question 11

Q

How do you plot two columns, such as ‘Close’ and ‘Adj Close’

df = pd.read_csv(“data/AAPL.csv”)

Answer

A

df [[ ‘Close’, ‘Adj Close’] ]. plot()

Question 12

Q

How many days were US stocks traded at NYSE in 2014?

Question 13

Q

What is S&P 500 and what is SPY?

Answer

A

S&P 500 - Stock Market Index based on 500 large American companies listed on the NYSE or NASDAQ. Essentially a weighted mean of the stock prices of the companies

SPY - SPDR S&P 500 - An ETF (Exchange-Traded Fund) that tracks the S&P 500 index

Question 14

Q

How do you create an empty data frame (df1) with a given datetime range?

start_date = ‘2010-01-22’

end_date = ‘2010-01-26’

Answer

A

dates = pd.date_range(start_date, end_date)

df1 = pd.DataFrame( index = dates)

Question 15

Q

Using an empty data frame (df1) in a specified daterange, how do you join df1 to a data frame for SPY (dfSPY)

df1 = pd.DataFrame( index = dates )

Answer

A

Ensure first that the SPY dataframe is indexed with the date column, not the numbered column. Additionally, ensure that na values are interpreted is a “not a number” and not as strings

dfSPY = pd.read_csv( “data/SPY.csv”, index_col = “Date”, parse_dates = True, na_values=[‘nan’] )

Join the two dataframes using DataFrame.join()

df1 = df1.join(dfSPY)

Question 16

Q

How do you drop NaN values on a data frame (df1)?

Answer

A

df1 = df1.dropna()

Question 17

Q

How do you drop NaN values when combining two dataframes (ie. df1 and dfSPY)?

Answer

A

df1.join ( dfSPY, how = ‘inner’ )

Question 18

Q

What is the default operation for the “how” parameter in the dataframe.join function?

Answer

A

The default option is left which indicates that the calling dataframe’s index will be used. Therefore, any dates from the calling dataframe will be preserved, potentially yielding NaN values if not shared by the other dataframe.

Question 19

Q

How can you read in multiple stocks into one dataframe though they may contain the same column names?

Answer

A

symbols = [‘GOOG’, ‘IBM’, ‘GLD’]

for symbol in symbols:

df_temp = pd.read_csv(“data/{}.csv”.format(symbol), index_col = ‘Date’, parse_dates = True, usecols=[‘Date’, ‘Adj Close’], na_values = [‘nan’])

Rename columns

df_temp = df_temp_rename( columns = {‘ Adj Close’ : symbol})

df1 = df1.join(df_temp)

Question 20

Q

In a dataframe (df) containing multiple symbols,

how would you drop dates in which SPY did not trade?

Answer

A

if symbol == ‘SPY’:

df = df.dropna( subset = [SPY])

Question 21

Q

How do you select the piece of data indicating 2010 - 02 - 13 to 2010 - 02 - 15 and only GOOG and GLD?

Answer

A

df = df.ix [‘2010-02-13’ : ‘2010-02-15’, [ ‘GOOG’, ‘GLD’] ]

Question 22

Q

What is the best way to normalize price data so that all prices start at 1.0?

Answer

A

df1 = df1 / df1[0]

OR

df1 = df1 / df.ix[0]

Question 23

Q

Slice and plot SPY and IBM over the daterange ‘2010-03-01’ to ‘2010-04-01’

Answer

A

start_index = ‘2010-03-01’

end_index = ‘2010-04-01’

columns = [‘SPY’, ‘IBM’]

plot_data(df.ix [start_index: end_index, columns], title=”title”)

df. plot()
plt. show()

…

def plot_data(df, title=”title”):

ax = df.plot(title = title, fontsize = 2)

ax. set_xlabel(“Date”)
ax. set_ylabel(“Price”)
plt. show()

Question 24

Q

How do you normalize a dataframe df?

Answer

A

df = df / df.ix [0 :]

Question 25

Q

Question 26

Q

How do you return the number of rows in an array, a?

Answer

A

a.shape[0]

Question 27

Q

How do you return the number of columns in an array, a?

Answer

A

a.shape[1]

Question 28

Q

How do you get the number of items in an array, a?

Question 29

Q

How do you get the sum of elements of an array, a?

Question 30

Q

How do you get the sum of each column of an array, a?

Answer

A

a.sum(axis = 0)

Question 31

Q

How do you get the sum of each row of an array, a?

Answer

A

a.sum(axis =1)

Question 32

Q

How do you get the location of the maximum value of an array, a?

Answer

A

a.argmax()

Question 33

Q

In an array a, how would you get the entire row of every other column up to the 3rd column?

Answer

A

a[:, 0:3:2]

where 0 indicates start at first column

3 indicates end before 3rd column

2 indicates choose every second element

Question 34

Q

How do you index an array, a, with another array, b?

Answer

A

a = np.random(10, size = 5)

indices = np.array( [1, 1, 2, 3]

a = [7, 6, 8, 5, 9]

a [indices] = [6, 6, 8, 5]

Question 35

Q

How would I access all elements in this array >5 ?

a = np.array( [1, 6, 5, 3, 8] )

Answer

A

a [a > 5]

Question 36

Q

How do you compute the daily returns of a dataframe, df?

Answer

A

The daily returns are the net earnings compared to the previous day.

daily_returns = df.copy

daily_returns[1 :] = ( df [1 :] / df [:-1].values ) - 1

daily_returns.ix[0, :] = 0

Question 37

Q

What is a bollinger band?

Answer

A

A way of quantifying how far a stock price has deviated from some norm.

Question 38

Q

Where are the bollinger bands?

Answer

A

2 standard deviations above and below the mean of the dataset. When the data crosses below the lower band, this could indicate a buy single. When the data crossed above the upper band, this could indicate a sell signal.

Question 39

Q

How do you calculate bollinger bands?

Answer

A

upper band = rolling mean + 2 * rolling std

lower band = rolling mean - 2 * rolling std.

Question 40

Q

What is an ETF?

Answer

A

An ETF or Exchange-Traded Fund is a basket of equities allocated in such a way that the overall portfolio tracks the performance of a stock exchange index. ETFs can be bought and sold on the market like shares.

Question 41

Q

How do you fill in missing data in a dataframe?

Answer

A

df. fillna( method = ‘‘ffill’)
df. fillna( method = ‘bfill’)

Fill forward first then fill backwards.

Question 42

Q

What is kurtosis?

Answer

A

It tells us about the tails of the distribution.

It tells us how different our distribution is from the Gaussian distribution.

A positive kurtosis is more occurance in the tales than expected.

Question 43

Q

How would you print a scatter ploy of ‘SPY’ and ‘GLD’ data?

Answer

A

df.plot( kind = ‘scatter’ , x = ‘SPY’ , y = ‘GLD’ )

Question 44

Q

How do we fit a polynomial of degree 1 to a graph?

Answer

A

beta, alpha = np.polyfit( dailyret[‘SPY’], dailyret[‘XOM’], 1)

plt. plot( dailyret[‘SPY’] , beta * dailyret[‘SPY’] + alpha )
plt. show()

Question 45

Q

How do you find the correlation on a dataframe?

Answer

A

df.corr ( method = pearson )

Question 46

Q

How do you calculate the daily portfolio value

Answer

A

1) normalize df (prices / prices[0]
2) determine allocations = normed * allocs
3) determine position values = allocs * start_val
4) determine portfolio values = pos_vals.sum(axis = 1)

Question 47

Q

What is the sharp ratio?

Answer

A

Risk adjusted return

All else being equal:

lower risk is better

higher return is better

SR also considers risk free rate of return

Question 48

Q

What is the formula for Sharp Ratio?

Answer

A

( Rp - Rf ) / StdDev

Rp - portfolio return

rf - risk free rate of return

stddev - std dev of portfolio return

ExpectedVal [Rp - Rf] / Std [Rp - Rf]

Mean [daily_rets - daily_rf] / std [daily_rets - daily_rf]

Using the shortcut and treating daily_rf as a constant:

mean [daily_rets - daily_rf] / std [daily_rets]

Question 49

Q

How do you compute the annual risk free rate into a daily amount?

Answer

A

Daily_Rf = 252nd sq rt ( begining value + risk free rate ) - 1

Question 50

Q

What do you do if the SR varies?

Answer

A

Consider SR an annual measure

Sr annualized = K * SR

K = sqrt ( #samples per year )

SR = sq rt (252) * mean ( daily_rets - daily_rf ) / std ( daily_rets)

Question 51

Q

Ranges - limits on X

Constraints - properties that must be true

Answer

A

How do you limit an optimizer to useful data?

Question 52

Q

What is an optimizer?

Answer

A

Find minimum values of functions
Build parameterized models based on data
Refine allocations to stocks in portfolios

Question 53

Q

How do you use an optimizer?

Answer

A

1) Provide a function to minimize
2) Provide an initial guess
3) Call the optimizer

Question 54

Q

What is the python library to optimize a function?

Answer

A

scipy.optimize

min_result = spo.minimize(func, Xguess, method=”SLSQP”, options = {‘disp’: True})

Question 55

Q

What is a convex function?

Answer

A

A real-valued function f(x) defined on an interval is called convex if the line segment between any two points on the graph of the function lies above the graph.

Question 56

Q

How do you build a parameterized model?

Answer

A

Figure out what you are minimizing.

Minimize the error

Question 57

Q

What are the types of funds

Answer

A

ETFs - Buy/sell like stocks, baskets of stocks, transparent

Mutual Fund - Buy/sell at end of day, quarterly disclosure, less transparent

Hedgefund - buy/sell by agreement, no disclosure, not transparent

Question 58

Q

What is liquid?

Answer

A

Ease with which one can buy shares in a holding

ETFs are liquid

Question 59

Q

What is large cap?

Answer

A

How is the company worth in terms of #shares x price

Price of the stock is related to what a share is selling at.

Question 60

Q

How can you tell what type a fund is?

Answer

A

ETFs - 3/4 letters

Mutual Funds - 5 letter

Hedge Funds - name

Question 61

Q

How are the manager of these funds compensated?

ETF

Mutual Funds

Hedge Funds

Answer

A

ETFs - Expense Ratio in terms of AUM (0.01 to 1%), tied to an index

Mutual Funds - Expense Ratio (0.5 to 3%)

Hedge Funds - Two and Twenty (2% of AUM and 20% of profits)

AUM - Assets Under Management is the total amount of money being managed by the fund.

Question 62

Q

What types of investors use hedge funds?

Answer

A

Individuals

Institutions - retirement funds, university foundations

Funds of funds - group together funds of individuals or institutions

Question 63

Q

What are hedge fun goals and metrics?

Answer

A

1) Beat a benchmark
2) Absolute returns

Question 64

Q

What is an order?

Answer

A

Buy or sell info

Symbol

shares

Limit or Market ( market means accept a good market price, limit price means no worse than a certain price)

Price

Answer 61

A

you -> broker -> exchange

you -> broker then joe -> broker then joe -> you

you -> broker -> dark pool <- broker2 <- lisa

Answer 62

A

Stop loss

Stop gain

Trailing stop

Selling Short

Answer 63

A

PV = FV / (1 + IR) ^i

PV - present value

FV - future value

Answer 64

A

Interest rate is used with a given present value, to figure out what the future value would be

Discount rate is used when we have a known or desired Future Value and want to compute the corresponding present value.

Answer 65

A

FV / DR

Future Value / Discount Rate

Answer 66

A

Total assets minus intangible assets and liabilities

Answer 67

A

of shares * price

Answer 68

A

A weighted set of assets.

Wi is the portion of funds in asset i

the sum of absolute value of the weights is 1.0

Answer 69

A

The weight * the return summed for all assets

Answer 70

A

An index that covers a large portion of stocks

US: SP500.

An index can be thought of as the “ocean” when malaise occurs

Index are cap weighted, where the weight of the stock is the market cap / sum of all market caps.

Answer 71

A

The return for a stock on day t is equal to Beta times the return on the market on day t plus alpha on that day.

Ri (t) = Bi * Rm(t) + Ai (t)

Beta component - market, SLOPE!

Alpha component - residual, y INT!

CAPM says that alpha is expected to be 0.

Answer 72

A

Passive - buy index and hold

Active - pick stocks (over/under weight stocks)

Answer 73

A

CAPM says that alpha is random and Expected (alpha) = 0

Active managers believe they can predict alpha, at least more than a coin flip.

Answer 74

A

Only way to beat market is choose Beta

Expected value of alpha = 0

Efficient Markets Hypothesis says you cant predict the market.

Answer 75

A

We ought to consider multiple betas.

Beta for different sectors

Answer 76

A

The price is too high

Answer 77

A

1) Noisy and uncertain forecasts
2) Challenging to estimate confidence
3) Holding time, allocation

Answer 78

A

Parametric:

slow training

query fast

Non parametric:

traning fast

query slow

complex patterns with no underlying model

Answer 79

A

Splitting data into many chunks to create different test/train data.

It does not work well with financial data because it is time sensitive.

Answer 80

A

When in-sample error is decreasing and out-of-sample error is increasing