Pandas hw2b - Subqueries Flashcards
Using the daily stock data from 2010, return a list of the unique trading days in 2010.
df2010C = df2010C.copy()
df2010C.retdate.drop_duplicates()
Return the symbol, date, and the dollar volume traded for the highest dollar volume traded stocks in 2010 on the NYSE.
df2010C = df2010C.copy()
df2010C[ ‘dollarVol’ ] = df2010C.vol * df2010C.cls
df2010C = (df2010C.loc[ (df2010C.exch == ‘NYSE’) ]
.nlargest(1, ‘dollarVol’)
) [ [‘symb’, ‘retdate’, ‘dollarVol’ ] ]
df2010C.head()
Using the 2010 data, return the stock (symbol only) with the largest volume on Jan 11th that also appears on Dec 1st.
df2010C = df2010.copy()
(df2010.loc[ (df2010C.retdate == ‘11-Jan-2010’)
& (df2010C.symb.isin(
df2010C.loc[(df2010C.retdate == ‘01-Dec-2010’)]
[‘symb’] ) )
]
.nlargest(1,’vol’)
) [ [ ‘symb’ ] ]
Using the 2010 data, return the stock symbol and a column called “HFlag” which is equal to 1 if the high - low is greater than 1 and zero otherwise. Only return those companies who stock symbol begins or ends with “A”.
df2010c = df2010.copy()
df2010c. loc[(df2010c.high - df2010c.low) > 1, ‘HFlag’] = 1
df2010c. loc[(df2010c.high - df2010c.low) < 1, ‘HFlag’] = 0
df2010c1 = df2010c.loc[ (df2010c.symb.str.startswith(‘A’)
| (df2010c.symb.str.endswith(‘A’))
] [‘symb’, ‘HFlag’]]
df2010c1.head()
Write a query which returns (a) the date, (b) closing price, (c) a flag (‘gt30’) which is equal to 1 when the closing price is greater than $30.00 and 0 otherwise for ‘AAPL’ in 2010.
df2010c = df2010.copy()
df2010c. loc[df2010c.cls > 30, ‘gt30’] = 1
df2010c. loc[df2010c.cls < 30, ‘gt30’] = 0
df2010c1 = df2010c.loc[ (df2010c.symb == ‘AAPL’) ]
[ [ ‘retdate’, ‘cls’, ‘gt30’ ] ]
df2010c1.head()
Return the list of symbols that exist in 2011, but not 2010.
df2010C = df2010.copy() df2011C = df2011.copy()
df2010unique = df2010.symb.drop_duplicates()
(df2011C.loc[ (~df2011C.symb.isin(df2010unique) ) ]
[‘symb’]
.drop_duplicates())
Using the fnd data, return company name, year, and the column called “Hflag” which is equal to 1 if the company has a net income larger than $1 Billion dollars and 0 otherwise. Only include those companies whose name begins with “B”
dffndC = dffnd.copy()
dffndC.loc[(dffndC.netinc > 1000), ‘HFlag’] = 1
dffndC.loc[(dffndC.netinc <= 1000), ‘HFlag’] = 0
dffndC.loc[ (dffndC.conm.str.startswith(‘B’) ) ]
[ [ ‘conm’, ‘fyear’, ‘HFlag’ ] ]
Using the fnd data, which ticker symbols have a net income to employee ratio greater than $1000 in fiscal year 2010 and also have a net income between 20 and 30 million dollars in 2011?
dffndC = dffnd.copy()
tics = dffndC.loc[ (dffndC.fyear == 2011)
& (dffndC.netinc > 20 )
& (dffndC.netinc < 30)][‘tic’]
dffndC = dffndC.loc[ (dffndC.emp > 0 )
& (dffndC.netinc > dffndC.emp)
& (dffndC.fyear == 2010)
& (dffndC.tic.isin(tics)) ]
[ [ ‘tic’ ] ]
dffndC
Using the fnd data, which companies (company name) in fiscal year 2010 had a profit margin greater than 20%, turnover more than 2 and more than 10,000 employees?
dffndc = dffnd.copy()
dffndc1 = dffndc.loc[ (dffndc.fyear == 2010)
& (dffndc.rev > 0)
& (dffndc.invt > 0)
& (((dffndc.netinc / dffndc.rev) * 100) > 20)
& ((dffndc.rev / dffndc.invt) > 2)
& (dffndc.emp > 10) ]
[ [ ‘conm’ ] ]
dffndc1.head()
The lowest five symbols by volume from January 11th, 2010 that have a volume between 1 million and 10 million on December 1st, 2011. In other words, of those stocks which had between 1 and 10 million shares traded on December 1st, 2011, which five have the lowest volume traded on January 11th, 2010.
df2010c = df2010.copy() df2011c = df2011.copy()
df2011symb = df2011c.loc[ (df2011c.retdate == ‘01-Dec-2011’)
& ((df2011c.vol > 1000000)
| (df2011c.vol < 10000000)) ]
[ ‘symb’]
df2010c1 = df2010c.loc[ (df2010c.symb.isin(df2011symb))
& (df2010c.retdate == ‘11-Jan-2010’) ]
.nsmallest(5,’vol’)
[‘symb’]
df2010c1.head()
Of the stocks (symbols) that existed in 2011, but not in 2010, which had the highest closing price in 2011?
df2010C = df2010.copy() df2011C = df2011.copy()
( df2011C.loc[ (~df2011C.symb.isin(df2010C.symb.drop_duplicates())) ] .nlargest(1,'cls') ['symb'] )
Which symbols were in the top 500 of dollar volume on the 2nd, 3rd, and 4th days of February 2011?
(The stock needs to be in the top 500 for all days)
df2011C = df2011.copy()
df2011C[‘dollarVol’] = df2011C.cls * df2011C.vol
df2nd = df2011C.loc[(df2011C.retdate == ‘02-Feb-2011’)]
.nlargest(500,’dollarVol’)
df3rd = df2011C.loc[(df2011C.retdate == ‘03-Feb-2011’)]
.nlargest(500,’dollarVol’)
df4th = df2011C.loc[(df2011C.retdate == ‘04-Feb-2011’)]
.nlargest(500,’dollarVol’)
- assign each day’s top 500
- by dollarVol to new data frames
( df2011C.loc[ (df2011C.symb.isin(df2nd.symb))
& (df2011C.symb.isin(df3rd.symb))
& (df2011C.symb.isin(df4th.symb)) ]
.symb.drop_duplicates() )
Of the symbols that had volume between 100,000 and 1,000,000 on the 2nd and 3rd of February 2011, which had volume greater than 5,000,000 on the 4th of February?
redo using above method
Write a query to generate the following dataset:
- company name, ticker symbol, revenue for all the companies whose name begin with “A” or “a”
- a column, revflag which is 1 if the revenue is greater than $25,000,000 and 0 otherwise
dffndc = dffnd.copy()
dffndc1 = dffndc.loc[ (dffndc.conm.str.startswith(‘A’))
| (dffndc.conm.str.startswith(‘a’)) ]
[‘conm’, ‘tic’, ‘rev’]
dffndc1. loc[ (dffndc1.rev > 25), ‘revflag’] = 1
dffndc1. loc[ (dffndc1.rev < 25), ‘revflag’] = 0
dffndc1.head()
Write a query to generate the following dataset:
- company name, ticker symbol, revenue, inventory, and employee information from fiscal year 2010
- A column called turnflag which is 1 for companies with turnover greater than 2, 0 otherwise
- For a company to be included it must have revenue, inventory, and employee all greater than zero for both 2010 and 2011
dffndc = dffnd.copy()
dffndcComp2011 = dffndc.loc[ (dffndc.fyear == 2011)
& (dffndc.rev > 0)
& (dffndc.invt > 0)
& (dffndc.emp > 0) ]
[ ‘conm’ ]
dffndcSelCom2010 = dffndc.loc[ (dffndc.fyear == 2010)
& (dffndc.rev > 0)
& (dffndc.invt > 0)
& (dffndc.emp > 0)
& (dffndc.conm.isin(dffndcComp2011)) ]
dffndcSelCom2010.loc[ (dffndcSelCom2010.invt > 0)
& ((dffndcSelCom2010.netinc/dffndcSelCom2010.invt) > 2)
, ‘turnflag’] = 1
dffndcSelCom2010.loc[ dffndcSelCom2010.turnflag != 1
, ‘turnflag’] = 0
dffndcFinSel = dffndcSelCom2010.loc[: ,
[‘conm’, ‘tic’,’rev’, ‘invt’, ‘emp’] ]
dffndcFinSel.head()