Python Flashcards
Dataframes: How to select all rows of a df where in col1 “Black” and “White” appear?
df[df[“col1”].isin([“Black”,”White”])]
Dataframes: How to select 2 columns?
df[[“col1”,”col2”]]
Dataframes: How to sort 2 columns of a df in descending order?
df.sort_values([“col1”,”col2”], ascending=[False, False])
Dataframes: How to get the values of a df?
df.values
Dataframes: How to get the description of a df?
df.describe()
Dataframes: How to get a shape of a df?
df.shape
Dataframes: How to get info about a df?
df.info()
Merging to the nearest date-times: How to replicate VLOOKUP(range_lookup=TRUE)?
pd.merge_asof(df1,df2,direction=”backward”) - direction:backward, forward, nearest
Advantage of pd.merge_ordered() vs. pd.merge?
pd.merge_ordered() has fill_method
Joining data: 2dfs (current, drafted) merged on column name and having the suffixes (‘_current’, ‘_drafted’) - how to sort the new df based on the column on which was merged?
current.merge(drafted, on=”name”, suffixes=(“_current”, “_drafted”), sort=True)
Joining data: Merge 2dfs (current, drafted) with an outer join and suffixes (‘_current’,’_drafted’) - how to show from which df a value is coming from?
current.merge(drafted, how=”outer”, on=”name”, suffixes=(“_current”,”_drafted”), indicator=True)
Joining data: Merging the df teams having the column player_id with df positions having player_id as index?
teams.merge(positions, left_on=”player_id”, right_index=True)
What are the joint default values for pd.merge and pd.merge_ordered?
pd.merge=inner join, pd.merge_ordered=outer join
Dataframes: How to use pd.merge_ordered with “ffill”?
pd.merge_ordered(df1, df2, on=”Date”, fill_method=”ffill”)
Dataframes: How to merge two dataframes with outer and sort index column called “Date”?
pd.merge(df1, df2, how=”outer”).sorted_values(“Date”)
Dataframes: How to join dataframes with integrated .join function with an inner join?
df1.join(df2, how=”inner”)
Dataframes: How to make an “inner” join with pd.merge on 2 columns and 2 suffixes?
pd.merge(df1, df2, on=[“col1”,”col2”], suffixes=[“_1”,”_2”], how=”inner”)
Dataframes: How to merge two dataframes on different columns in both dataframes?
pd.merge(df1,df2, left_on=”col1”, right_on=”col2”)
Dataframes: How to merge on several columns and/with using suffixes?
pd.merge(df1,df2,on=[“col1”,”col2”],suffixes=[“_1”,”_2”])
Dataframes: How to merge on several columns?
pd.merge(df1,df2,on=[“col1”,”col2”])
Dataframes: How to merge on one specific column?
pd.merge(df1,df2,on=”col1”)
Dataframes: What happens if pd.merge() is applied on 2 dataframes without any additional arguments?
computes a merge on all columns that occur in both dataframes, this is by default an “inner” join because it glues together only rows that match in the joining column of BOTH dataframes
Dataframes: Pandas: How to extend/ replace concat() with the ability to align rows using multiple columns?
pd.merge()
Dataframes: how to sort data by index?
df.sort_index()
Dataframes: How to use pd.concat to apply multi-indexing?
df_new=pd.concat([df1,df2],keys=[2020,2021],axis=0)
Dataframes: Print all rows that contain the name “Tom” in the column “name”
print(df.loc[df[“name”]==”Tom”])
Dataframe: When using .append: how to make a new RangeIndex of unique integers for each row?
df.total=df1.append(df2,ignore_index=True)
Concatenating rows: How to stack them horizontally?
pd.concat([df1,df2],axis=1)
Concatenating rows: How to stack them vertically?
pd.concat([df1,df2], axis=0)
Dataframes: How to get the name of an index?
df.index.name
Dataframes: How to use concat and reset index?
df_new=pd.concat([df1,df2], ignore_index=True)
Dataframes: How to use “append” with reset index?
df_new=df.append(df2).reset_index(drop=True)
Dataframes: How to replace all “F” with “C” in all column names in a df?
df_new.columns=df.columns.str.replace(“F”,”C”)
Dataframes: with .add(): how to add several dataframes with a fill_value=0?
bronze.add(silver,fill_value=0).add(gold,fill_value=0)
Dataframes: How to add one dataframe to another?
df1.add(df2)
Dataframe: How to add one dataframe to another and use 0 if value doesn’t exist?
df1.add(df2, fill_value=0)
Dataframes: how to calculate the percent changes?
df.pct_change()
Dataframes: How to divide one dataframe by another?
df.divide(df2, axis=”rows”)
Dataframes: How to use .loc for column “sales” for “Jan” to “Mar”?
df.loc[“Jan”:”Mar”,”sales”]
Dataframes: How to use .loc for the columns “Sales” and “Price” for “Jan” to “Mar”?
df.loc[“Jan”:”Mar”,[“Sales”,”Price”]]
Dataframes: if index is a date column how can the dates be parsed?
pd.read_csv(“sales.csv”, index_col=”Dates”, parse_dates=True)
Dataframes: When reindexing: time chaining method to replace the null values with the last preceding non-null value?
sales.reindex().ffill()
Dataframe: How to sort the index in reverse order?
df.sort_index(ascending=False)
Dataframes: How to sort the values in a column?
df.sort_values(“col1”)
Dataframes: how to drop rows with NA?
df.dropna()
What happens when a dataframe is reindexed with a missing label?
an empty row will be inserted
Reindex a dataframe with different dataframe?
df.reindex(df2.index)
How to sort the index of a dataframe?
df.sort_index()
Reindex dataframes: with first three months?
df=df.reindex([“Jan”,”Feb”,”Mar”])
Show index of a dataframe?
print(df.index)
With pd.read_csv: how to determine the index column?
index_col
Using a loop to append csv files into a dataframe?
filenames=[“df1.csv”,”df2.csv”], dataframes=[pd.read_csv(f) for f in filenames]
How to import csv files with * and glob?
from glob import glob, filenames=glob(“sales*.csv”), dataframes=[pd.read_csv(f) for f in filenames]
Dataframes: How to make an “outer” join with pd.concat and horizontally?
pd.concat([df1,df2], axis=1, join=”outer”)
Dataframes: How to make an “inner” join with pd.concat and horizontally?
pd.concat([df1,df2], axis=1, join=”inner”)
Pandas: How to sort by index values?
df.sort_index()
Directly on df: How to rotate x-axis by 45-degrees?
by putting “rot” into plot - df.plot(… rot=45), plt.show()
Directly on df: How to make a scatter plot?
df.plot(x=”column1”, y=”column2”, kind=”scatter”), plt.show()
Directly on df: How to add a legend?
plt.legend([“title1”,”title2”])
Directly on df: How to make a histogram partially transparent?
df.hist(…, alpha=0.7)
How to detect missing values with True and False in a df?
df.isna()
How to detect missing values with True and False in a df in any column=
df.isna().any()
How to count the missing values in a column?
df.isna().sum()
How to visualize in a boxplot the number of missing values in a dataframe?
df.isna().sum().plot(kind=”box”), plt.show()
How to remove rows with missing values in a df?
df.dropna()
How to fill NaN values in a df?
df.fillna()
Example of a very simple dictionary?
my_dict={“key1”:value1, “key2”:value2, “key3”:value3}
How to access the value of a dictionary?
my_dict[“key1”]
How to make a list of dictionaries by row and then put it into a dataframe?
list_of_dicts=[{“key1”:value1, “key2”:value2, “key3”:value3}, {“key1”:value4, “key2”:value5, “key3”:value6}] + pd.DataFrame(list_of_dicts)
How to make a dictionary of lists by column and then put it into a dataframe?
dict_of_lists={“key1”:[“value1”,”value2”], “key2”:[“value3”,”value4”], “key3”:[“value5”,”value6”]} + pd.DataFrame(dict_of_lists)
Directly on df: How to show a histogram with 20 bins?
df.plot(kind=”hist”, bins=20) + plt.show()
Directly on df: Bar plot with title?
df.plot(kind=”bar”, title=”irgendwas”) + plt.show()
Directly on df: How to make a line chart with x-axis based on a column and y-axis based on a different column?
df.plot(kind=”line”, x=”col1”, y=”col2”) + plt.show()
Direct on df: How to show a histogram?
df.plot(kind=”hist”) + plt.show()
Pandas: How to set a column as the index?
df=df.set_index(“col1”)
Pandas: How to remove an index?
df.reset_index()
Pandas: How to drop an index and delete the respective column?
df.reset_index(drop=True)
Pandas: How to use .loc to get the same result: df[df[“col1”].isin([“name1”,”name2”])]?
df.loc[[“name1”,”name2”]]
Pandas: How to use pivot_table with 0 instead of NaN for missing values and having the mean at the end of rows and columns?
df.pivot_table(values=”col1”, index=”col2”, columns=”col3”, fill_value=0, margins=True)
How to transform df.groupby([“col1”,”col2”,”col3”]).mean() into a pivot_table?
df.pivot_table(values=”col1”, index=”col2”, columns=”col3”)
Pandas: How to use multiple statistics with pivot_table?
df.pivot_table(values=”col1”, index=”col2”, aggfunc=[np.mean, np.median])
Pandas: How to change df.groupby(“col1”)[“col2”].mean() into a pivot table?
df.pivot_table(values=”col2”, index=”col1”)
Pandas: How to group by multiple variables?
df.groupby([“col1”,”col2”])[“col3”].mean()
Pandas: How to get the min, max and sum for a column based on the characteristic of another column and groupby?
df.groupby(“col1”)[“col2”].agg([min,max,sum])
Pandas: How to get the mean values of a column based on the characteristics of another column and groupby?
df.groupby(“col1”)[“col2”].mean()
Pandas: How to show the proportionate amount of how often a value appears in a column?
df[“col1”].value_counts(normalize=True)
Pandas: How to count the values in a column and have the highest values first?
df[“col1”].value_counts(sort=True)
Pandas: How to count the values in a column?
df[“col1”].value_counts()
Pandas: How to drop duplicates based on 2 columns?
unique=df.drop_duplicates(subset=[“col1”,”col2”])
Pandas: How to drop duplicates base on a column?
df.drop_duplicates(subset=”col1”)
Pandas: How to get the average of a column?
df[“col1”].mean()
Pandas: How to get the variance of a column?
df[“col1”].var()
Pandas: How to calculate the cumulative max?
df[“col1”].cummax()
Pandas: How to calculate the cumulative sum?
df[“col1”].cumsum()
Pandas: How to get the sum of a column?
df[“col1”].sum()
Pandas: How to get the maximum value of a column?
df[“col1”].max()
Pandas: How to merge 3 dfs at once? First two on col1 and then col2.
df.merge(df2, on=”col1”) \
.merge(df3, on=”col2”)
Pandas: How to count the number of rows in col1 where something is missing?
df[“col1”].isnull().sum()
Datetime: How to print/present a datetime object in a certain format?
datetime.datetime.strftime(“%Y-%m-%d”)
Datetime: How to transform a string into a datetime format?
datetime.datetime.strptime(date_str, “%Y-%m-%d”)
Datetime: How to calculate the difference between datetimes?
with timedelta
Datetime: How to substract 1 week from datetime?
datetime.now() - timedelta(weeks = 1)
Dictionaries: How to ask the value of a key and not getting an error message if the value doesn’t exist?
my_dict.get(“key1”)
Dictionaries: How to ask the value of a key and getting a certain message if the value doesn’t exist?
my_dict.get(“key1”, “certain_message”)
Dictionaries: How to delete a value from the dictionary?
del(my_dict[“key1”])
Module to import data from Yahoo Finance?
yfinance
yfinance: Download of SPY and AAPL?
data = yf.download(“SPY AAPL”, start=”2017-01-01”, end=”2017-04-30”)
yfinance: Download of AAPL dividends?
apple= yf.Ticker(“aapl”), apple.dividends
How to calculate log-returns for msft[‘Log_Ret’] based on msft[‘Close’]?
msft[‘Log_Ret’] = np.log(msft[‘Close’] / msft[‘Close’].shift(1))
How to calculate volatility for msft[‘Volatility’] based on msft[‘Log_Ret’]?
msft[‘Volatility’] = msft[‘Log_Ret’].rolling(252).std() * np.sqrt(252)
How plot the two columns msft[‘Close’] and msft[‘Volatility’] in two subplots but one graph and having for the lines the color blue and a figsize of 8,6?
msft[[‘Close’, ‘Volatility’]].plot(subplots=True, color=’blue’, figsize=(8, 6))
How to time codes, i.e. how long it takes to execute them?
with %timeit or %%timeit
Numpy: How to create a random standard normal variable?
np.random.standard_normal()
Numpy: How to insert values based on values of another column?
np.where(sp500[‘42-252’] > SD, 1, 0)
Backtesting a strategy: How to determine the backtest regime (sp500[‘Regime’]) based on column sp500[‘42-252’] and criteria SD?
sp500[‘Regime’]=np.where(sp500[‘42-252’] > SD, 1, 0)
sp500[‘Regime’]=np.where(sp500[‘42-252’] < -SD, -1, sp500[‘Regime’])
sp500[‘Regime’].value_counts()
Backtesting a strategy: How to calculate the day to day return of the strategy in the column sp500[‘Strategy_Returns’] based on column sp500[‘Regime’] and sp500[‘Market_Returns’]?
sp500[‘Strategy_Returns’] = sp500[‘Regime’].shift(1) * sp500[‘Market_Returns’]
Backtesting a strategy: How to plot the day to day returns of a strategy as well as the S&P as a cumulating sum based on the columns sp500[‘Strategy_Returns’] and sp500[‘Market_Returns’]? The graph should show grids and have a figsize of 8,5.
sp500[[‘Strategy_Returns’, ‘Market_Returns’]].cumsum().apply(np.exp).plot(grid=True, figsize=(8, 5))
String objects: How to capitalize string s?
s.capitalize()
String objects: How to split string s?
s.split()
String objects: How to find a string (“string”) in the string s?
s.find(‘string’)
String objects: How to count the number of occurrences of a substring in a string called s?
s.count(sub[, start[, end]])
String: How to separate a list of words in a string called s with “,” as separator?
s.split(“,”)
String: How to replace something in a string called s?
s.replace(old, new)
Lists: How to replace the ith element by x in a list called l?
l[i]=x
Lists: How to replace every kth element from i to j - 1 by s?
l[i:j:k] = s
Lists: How to append the element x to the list object called l?
l.append(x)
Lists: How to count the number of occurences of the element object x in the list object called l?
l.count(x)
Lists: How to delete every kth elements in a list called l with index values i to j – 1?
del l[i:j:k]
Lists: How to append all elements of s to an object called l?
l.extend(s)
Lists: How to show the first index number of x between elements i and j - 1?
l.index(x[, i[, j]])
Lists: How to insert x at/before index i into a list called l?
l.insert(i, x)
Lists: How to remove the value at index i from a list called l?
l.remove(i)
Lists: How to remove an element from a list called l with index i and return it?
l.pop(i)
Lists: How to reverse all items in place in a list called l?
l.reverse()
Lists: How to sort all items in place in a list called l?
l.sort()
Dict: Item of d with k?
d[k]
Dict: Setting item key k to x?
d[k]=x
Dict: Deleting item with key k?
del d[k]
Dict: Removing all items in d?
d.clear()
Dict: How to make a copy of dict d?
d.copy()
Dict: How to check if True if k is a key?
d.has_key(k)
Dict: Copy of all key-value pairs of dict d?
d.items()
Dict: Iterator over all items of dict d?
d.iteritems()
Dict: Iterator over all keys of dict d?
d.iterkeys()
Dict: Iterator over all values of dict d?
d.itervalues()
Dict: Copy of all keys of dict d?
d.keys()
Dict: How to return and remove item key k in dict d?
d.popitem(k)
Dict: How to update items in d with items from d2?
d.update({d2})
Dict: How to show all values of dictionary d?
d.values()
Matplotlib: How to show grids?
plt.grid(True)
plt.axis: How to turn axis and lines and labels off?
plt.axis(‘off’)
plt.axis: How to make equal scaling?
plt.axis(‘equal’)
plt.axis: How make equal scaling via dimension changes?
plt.axis(‘scaled’)
plt.axis: How to make all data visible (tightens limits)?
plt.axis(‘tight’)
plt.axis: How to make all data visible (with data limits)?
plt.axis(‘image’)
plt.axis: How to determine minimum and maximum values of x and y?
xmin, xmax, ymin, ymax or ylim and xlim
matplotlib: How to determine title of a graph?
plt.title()
Matplotlib: How to determine x and y labels?
plt.xlabel and plt.ylabel
Matplotlib: How make a graph with data y and with red circles?
plt.plot(y, ‘ro’)
Matplotlib: How to position the legend?
plt.legend(loc=0)
Matplotlib: How to make a second y-axis?
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
Matplotlib: How to address a subplot?
plt.subplot(numrows, numcols, fignum)
Matplotlib: How to plot two values in a Histogram with data y(having two columns), with the labels “1st” and “2nd” and 25 bins next to each other?
plt.hist(y, label=[‘1st’,’2nd’], bins=25)
Show the index of a dataframe?
df.index
Show the columns of a dataframe?
df.columns
Pandas: creating a DatetimeIndex object for 1.1.2015 with frequency months and occurence 9 times?
pd.date_range(‘2015-1-1’, periods=9, freq=’M’)
Python module to import data via URL?
from urllib.request import urlretrieve
How to use urlretrieve to save a txt file from the web locally?
url=’https://www.stoxx.com/document/Indices/Current/HistoricalData/h_vstoxx.txt’
urlretrieve(url, ‘./data/vstoxx.txt’)
How to remove all blanks in a txt file?
lines=open(‘./data/vstoxx.txt’, ‘r’).readlines()
lines=[line.replace(‘ ‘,’’) for line in lines]
How to fill in a dataframe the NA values with the last available values from the data series?
data.fillna(method=’ffill’)
How to make a regression with xdat and ydat?
pd.ols(y=ydat, x=xdat)
How to calculate the correlation between two variables in df “rets” having the columns .SPX and .VIX?
rets.corr()
How to import a csv file (data.csv) and read all lines?
csv_file=open(path + ‘data.csv’ + ‘r’)
content = csv_file.readlines()
How to use the arange function of NumPy to generate minutes data from 2020-01-01 10:00:00 to 2020-12-31 10:00:00)?
np.arange(‘2020-01-01 10:00:00’, ‘2020-12-31 10:00:00’, dtype=’datetime64[m]’)
How to turn the grid on in a graph?
plt.grid(True)
How to create 10 random numbers?
[random.random() for _ in range(10)]
How to create 10 random numbers between 1 and 5?
[random.randint(1,5) for _ in range(10)]
How to make a random amount of letters of the alphabet and then count which letter occurs how many times?
import random import string lc=string.ascii_lowercase rc=[random.choice(lc) for _ in range(10000)] c={c:rc.count(c) for c in lc}
Pandas: How to copy/insert a website into a variable?
pd.read_html(url)
If share price is in df data: how to normalize the share price and plot it in a 8,6 figsize?
(data/data.ix[0]*100).plot(figsize=(8, 6))
If share price is in df data: How to calculate log-returns?
np.log(data/ data.shift(1))
How to calculate from a df of daily log-returns a covariance matrix?
rets.cov()*252
How to make a colorbar for a scatter plot with the label “Sharpe ratio”?
plt.colorbar(label=’Sharpe ratio’)
Pandas: How to read data from an Excel file from a workbook called “workbook.xlsx” and sheet called “first_sheet”?
df=pd.read_excel(path + ‘workbook.xlsx’, ‘first_sheet’, header=None)
Pandas: How to write a df into an Excel sheet called “new_book_1.xlsx” and into “my_sheet”?
df.to_excel(path + “new_book_1.xlsx”, “my_sheet”)
How to import matplotlib module?
import matplotlib.pyplot as plt
How to create figure and axis object in matplotlib?
fig, ax = plt.subplots()