Pandas hw2b - Subqueries Flashcards

1
Q

Using the daily stock data from 2010, return a list of the unique trading days in 2010.

A

df2010C = df2010C.copy()

df2010C.retdate.drop_duplicates()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Return the symbol, date, and the dollar volume traded for the highest dollar volume traded stocks in 2010 on the NYSE.

A

df2010C = df2010C.copy()

df2010C[ ‘dollarVol’ ] = df2010C.vol * df2010C.cls

df2010C = (df2010C.loc[ (df2010C.exch == ‘NYSE’) ]
.nlargest(1, ‘dollarVol’)
) [ [‘symb’, ‘retdate’, ‘dollarVol’ ] ]

df2010C.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Using the 2010 data, return the stock (symbol only) with the largest volume on Jan 11th that also appears on Dec 1st.

A

df2010C = df2010.copy()

(df2010.loc[ (df2010C.retdate == ‘11-Jan-2010’)
& (df2010C.symb.isin(
df2010C.loc[(df2010C.retdate == ‘01-Dec-2010’)]
[‘symb’] ) )
]
.nlargest(1,’vol’)
) [ [ ‘symb’ ] ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Using the 2010 data, return the stock symbol and a column called “HFlag” which is equal to 1 if the high - low is greater than 1 and zero otherwise. Only return those companies who stock symbol begins or ends with “A”.

A

df2010c = df2010.copy()

df2010c. loc[(df2010c.high - df2010c.low) > 1, ‘HFlag’] = 1
df2010c. loc[(df2010c.high - df2010c.low) < 1, ‘HFlag’] = 0

df2010c1 = df2010c.loc[ (df2010c.symb.str.startswith(‘A’)
| (df2010c.symb.str.endswith(‘A’))
] [‘symb’, ‘HFlag’]]

df2010c1.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Write a query which returns (a) the date, (b) closing price, (c) a flag (‘gt30’) which is equal to 1 when the closing price is greater than $30.00 and 0 otherwise for ‘AAPL’ in 2010.

A

df2010c = df2010.copy()

df2010c. loc[df2010c.cls > 30, ‘gt30’] = 1
df2010c. loc[df2010c.cls < 30, ‘gt30’] = 0

df2010c1 = df2010c.loc[ (df2010c.symb == ‘AAPL’) ]
[ [ ‘retdate’, ‘cls’, ‘gt30’ ] ]

df2010c1.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Return the list of symbols that exist in 2011, but not 2010.

A
df2010C = df2010.copy()
df2011C = df2011.copy()

df2010unique = df2010.symb.drop_duplicates()

(df2011C.loc[ (~df2011C.symb.isin(df2010unique) ) ]
[‘symb’]
.drop_duplicates())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Using the fnd data, return company name, year, and the column called “Hflag” which is equal to 1 if the company has a net income larger than $1 Billion dollars and 0 otherwise. Only include those companies whose name begins with “B”

A

dffndC = dffnd.copy()

dffndC.loc[(dffndC.netinc > 1000), ‘HFlag’] = 1
dffndC.loc[(dffndC.netinc <= 1000), ‘HFlag’] = 0

dffndC.loc[ (dffndC.conm.str.startswith(‘B’) ) ]
[ [ ‘conm’, ‘fyear’, ‘HFlag’ ] ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Using the fnd data, which ticker symbols have a net income to employee ratio greater than $1000 in fiscal year 2010 and also have a net income between 20 and 30 million dollars in 2011?

A

dffndC = dffnd.copy()

tics = dffndC.loc[ (dffndC.fyear == 2011)
& (dffndC.netinc > 20 )
& (dffndC.netinc < 30)][‘tic’]

dffndC = dffndC.loc[ (dffndC.emp > 0 )
& (dffndC.netinc > dffndC.emp)
& (dffndC.fyear == 2010)
& (dffndC.tic.isin(tics)) ]
[ [ ‘tic’ ] ]

dffndC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Using the fnd data, which companies (company name) in fiscal year 2010 had a profit margin greater than 20%, turnover more than 2 and more than 10,000 employees?

A

dffndc = dffnd.copy()

dffndc1 = dffndc.loc[ (dffndc.fyear == 2010)
& (dffndc.rev > 0)
& (dffndc.invt > 0)
& (((dffndc.netinc / dffndc.rev) * 100) > 20)
& ((dffndc.rev / dffndc.invt) > 2)
& (dffndc.emp > 10) ]
[ [ ‘conm’ ] ]

dffndc1.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The lowest five symbols by volume from January 11th, 2010 that have a volume between 1 million and 10 million on December 1st, 2011. In other words, of those stocks which had between 1 and 10 million shares traded on December 1st, 2011, which five have the lowest volume traded on January 11th, 2010.

A
df2010c = df2010.copy()
df2011c = df2011.copy()

df2011symb = df2011c.loc[ (df2011c.retdate == ‘01-Dec-2011’)
& ((df2011c.vol > 1000000)
| (df2011c.vol < 10000000)) ]
[ ‘symb’]

df2010c1 = df2010c.loc[ (df2010c.symb.isin(df2011symb))
& (df2010c.retdate == ‘11-Jan-2010’) ]
.nsmallest(5,’vol’)
[‘symb’]

df2010c1.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Of the stocks (symbols) that existed in 2011, but not in 2010, which had the highest closing price in 2011?

A
df2010C = df2010.copy()
df2011C = df2011.copy()
( df2011C.loc[ 
             (~df2011C.symb.isin(df2010C.symb.drop_duplicates())) 
                     ]
                .nlargest(1,'cls')
                ['symb'] )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which symbols were in the top 500 of dollar volume on the 2nd, 3rd, and 4th days of February 2011?
(The stock needs to be in the top 500 for all days)

A

df2011C = df2011.copy()

df2011C[‘dollarVol’] = df2011C.cls * df2011C.vol

df2nd = df2011C.loc[(df2011C.retdate == ‘02-Feb-2011’)]
.nlargest(500,’dollarVol’)
df3rd = df2011C.loc[(df2011C.retdate == ‘03-Feb-2011’)]
.nlargest(500,’dollarVol’)
df4th = df2011C.loc[(df2011C.retdate == ‘04-Feb-2011’)]
.nlargest(500,’dollarVol’)

    • assign each day’s top 500
    • by dollarVol to new data frames

( df2011C.loc[ (df2011C.symb.isin(df2nd.symb))
& (df2011C.symb.isin(df3rd.symb))
& (df2011C.symb.isin(df4th.symb)) ]
.symb.drop_duplicates() )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Of the symbols that had volume between 100,000 and 1,000,000 on the 2nd and 3rd of February 2011, which had volume greater than 5,000,000 on the 4th of February?

A

redo using above method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Write a query to generate the following dataset:

  • company name, ticker symbol, revenue for all the companies whose name begin with “A” or “a”
  • a column, revflag which is 1 if the revenue is greater than $25,000,000 and 0 otherwise
A

dffndc = dffnd.copy()

dffndc1 = dffndc.loc[ (dffndc.conm.str.startswith(‘A’))
| (dffndc.conm.str.startswith(‘a’)) ]
[‘conm’, ‘tic’, ‘rev’]

dffndc1. loc[ (dffndc1.rev > 25), ‘revflag’] = 1
dffndc1. loc[ (dffndc1.rev < 25), ‘revflag’] = 0

dffndc1.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write a query to generate the following dataset:

  • company name, ticker symbol, revenue, inventory, and employee information from fiscal year 2010
  • A column called turnflag which is 1 for companies with turnover greater than 2, 0 otherwise
  • For a company to be included it must have revenue, inventory, and employee all greater than zero for both 2010 and 2011
A

dffndc = dffnd.copy()

dffndcComp2011 = dffndc.loc[ (dffndc.fyear == 2011)
& (dffndc.rev > 0)
& (dffndc.invt > 0)
& (dffndc.emp > 0) ]
[ ‘conm’ ]

dffndcSelCom2010 = dffndc.loc[ (dffndc.fyear == 2010)
& (dffndc.rev > 0)
& (dffndc.invt > 0)
& (dffndc.emp > 0)
& (dffndc.conm.isin(dffndcComp2011)) ]

dffndcSelCom2010.loc[ (dffndcSelCom2010.invt > 0)
& ((dffndcSelCom2010.netinc/dffndcSelCom2010.invt) > 2)
, ‘turnflag’] = 1

dffndcSelCom2010.loc[ dffndcSelCom2010.turnflag != 1
, ‘turnflag’] = 0

dffndcFinSel = dffndcSelCom2010.loc[: ,
[‘conm’, ‘tic’,’rev’, ‘invt’, ‘emp’] ]

dffndcFinSel.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly