Python Flashcards
Dataframes: How to select all rows of a df where in col1 “Black” and “White” appear?
df[df[“col1”].isin([“Black”,”White”])]
Dataframes: How to select 2 columns?
df[[“col1”,”col2”]]
Dataframes: How to sort 2 columns of a df in descending order?
df.sort_values([“col1”,”col2”], ascending=[False, False])
Dataframes: How to get the values of a df?
df.values
Dataframes: How to get the description of a df?
df.describe()
Dataframes: How to get a shape of a df?
df.shape
Dataframes: How to get info about a df?
df.info()
Merging to the nearest date-times: How to replicate VLOOKUP(range_lookup=TRUE)?
pd.merge_asof(df1,df2,direction=”backward”) - direction:backward, forward, nearest
Advantage of pd.merge_ordered() vs. pd.merge?
pd.merge_ordered() has fill_method
Joining data: 2dfs (current, drafted) merged on column name and having the suffixes (‘_current’, ‘_drafted’) - how to sort the new df based on the column on which was merged?
current.merge(drafted, on=”name”, suffixes=(“_current”, “_drafted”), sort=True)
Joining data: Merge 2dfs (current, drafted) with an outer join and suffixes (‘_current’,’_drafted’) - how to show from which df a value is coming from?
current.merge(drafted, how=”outer”, on=”name”, suffixes=(“_current”,”_drafted”), indicator=True)
Joining data: Merging the df teams having the column player_id with df positions having player_id as index?
teams.merge(positions, left_on=”player_id”, right_index=True)
What are the joint default values for pd.merge and pd.merge_ordered?
pd.merge=inner join, pd.merge_ordered=outer join
Dataframes: How to use pd.merge_ordered with “ffill”?
pd.merge_ordered(df1, df2, on=”Date”, fill_method=”ffill”)
Dataframes: How to merge two dataframes with outer and sort index column called “Date”?
pd.merge(df1, df2, how=”outer”).sorted_values(“Date”)
Dataframes: How to join dataframes with integrated .join function with an inner join?
df1.join(df2, how=”inner”)
Dataframes: How to make an “inner” join with pd.merge on 2 columns and 2 suffixes?
pd.merge(df1, df2, on=[“col1”,”col2”], suffixes=[“_1”,”_2”], how=”inner”)
Dataframes: How to merge two dataframes on different columns in both dataframes?
pd.merge(df1,df2, left_on=”col1”, right_on=”col2”)
Dataframes: How to merge on several columns and/with using suffixes?
pd.merge(df1,df2,on=[“col1”,”col2”],suffixes=[“_1”,”_2”])
Dataframes: How to merge on several columns?
pd.merge(df1,df2,on=[“col1”,”col2”])
Dataframes: How to merge on one specific column?
pd.merge(df1,df2,on=”col1”)
Dataframes: What happens if pd.merge() is applied on 2 dataframes without any additional arguments?
computes a merge on all columns that occur in both dataframes, this is by default an “inner” join because it glues together only rows that match in the joining column of BOTH dataframes
Dataframes: Pandas: How to extend/ replace concat() with the ability to align rows using multiple columns?
pd.merge()
Dataframes: how to sort data by index?
df.sort_index()
Dataframes: How to use pd.concat to apply multi-indexing?
df_new=pd.concat([df1,df2],keys=[2020,2021],axis=0)
Dataframes: Print all rows that contain the name “Tom” in the column “name”
print(df.loc[df[“name”]==”Tom”])
Dataframe: When using .append: how to make a new RangeIndex of unique integers for each row?
df.total=df1.append(df2,ignore_index=True)
Concatenating rows: How to stack them horizontally?
pd.concat([df1,df2],axis=1)
Concatenating rows: How to stack them vertically?
pd.concat([df1,df2], axis=0)
Dataframes: How to get the name of an index?
df.index.name
Dataframes: How to use concat and reset index?
df_new=pd.concat([df1,df2], ignore_index=True)
Dataframes: How to use “append” with reset index?
df_new=df.append(df2).reset_index(drop=True)
Dataframes: How to replace all “F” with “C” in all column names in a df?
df_new.columns=df.columns.str.replace(“F”,”C”)
Dataframes: with .add(): how to add several dataframes with a fill_value=0?
bronze.add(silver,fill_value=0).add(gold,fill_value=0)
Dataframes: How to add one dataframe to another?
df1.add(df2)
Dataframe: How to add one dataframe to another and use 0 if value doesn’t exist?
df1.add(df2, fill_value=0)
Dataframes: how to calculate the percent changes?
df.pct_change()
Dataframes: How to divide one dataframe by another?
df.divide(df2, axis=”rows”)
Dataframes: How to use .loc for column “sales” for “Jan” to “Mar”?
df.loc[“Jan”:”Mar”,”sales”]
Dataframes: How to use .loc for the columns “Sales” and “Price” for “Jan” to “Mar”?
df.loc[“Jan”:”Mar”,[“Sales”,”Price”]]
Dataframes: if index is a date column how can the dates be parsed?
pd.read_csv(“sales.csv”, index_col=”Dates”, parse_dates=True)
Dataframes: When reindexing: time chaining method to replace the null values with the last preceding non-null value?
sales.reindex().ffill()
Dataframe: How to sort the index in reverse order?
df.sort_index(ascending=False)
Dataframes: How to sort the values in a column?
df.sort_values(“col1”)
Dataframes: how to drop rows with NA?
df.dropna()
What happens when a dataframe is reindexed with a missing label?
an empty row will be inserted
Reindex a dataframe with different dataframe?
df.reindex(df2.index)
How to sort the index of a dataframe?
df.sort_index()
Reindex dataframes: with first three months?
df=df.reindex([“Jan”,”Feb”,”Mar”])
Show index of a dataframe?
print(df.index)
With pd.read_csv: how to determine the index column?
index_col
Using a loop to append csv files into a dataframe?
filenames=[“df1.csv”,”df2.csv”], dataframes=[pd.read_csv(f) for f in filenames]
How to import csv files with * and glob?
from glob import glob, filenames=glob(“sales*.csv”), dataframes=[pd.read_csv(f) for f in filenames]
Dataframes: How to make an “outer” join with pd.concat and horizontally?
pd.concat([df1,df2], axis=1, join=”outer”)
Dataframes: How to make an “inner” join with pd.concat and horizontally?
pd.concat([df1,df2], axis=1, join=”inner”)
Pandas: How to sort by index values?
df.sort_index()
Directly on df: How to rotate x-axis by 45-degrees?
by putting “rot” into plot - df.plot(… rot=45), plt.show()
Directly on df: How to make a scatter plot?
df.plot(x=”column1”, y=”column2”, kind=”scatter”), plt.show()
Directly on df: How to add a legend?
plt.legend([“title1”,”title2”])
Directly on df: How to make a histogram partially transparent?
df.hist(…, alpha=0.7)
How to detect missing values with True and False in a df?
df.isna()
How to detect missing values with True and False in a df in any column=
df.isna().any()
How to count the missing values in a column?
df.isna().sum()
How to visualize in a boxplot the number of missing values in a dataframe?
df.isna().sum().plot(kind=”box”), plt.show()
How to remove rows with missing values in a df?
df.dropna()
How to fill NaN values in a df?
df.fillna()
Example of a very simple dictionary?
my_dict={“key1”:value1, “key2”:value2, “key3”:value3}
How to access the value of a dictionary?
my_dict[“key1”]
How to make a list of dictionaries by row and then put it into a dataframe?
list_of_dicts=[{“key1”:value1, “key2”:value2, “key3”:value3}, {“key1”:value4, “key2”:value5, “key3”:value6}] + pd.DataFrame(list_of_dicts)
How to make a dictionary of lists by column and then put it into a dataframe?
dict_of_lists={“key1”:[“value1”,”value2”], “key2”:[“value3”,”value4”], “key3”:[“value5”,”value6”]} + pd.DataFrame(dict_of_lists)
Directly on df: How to show a histogram with 20 bins?
df.plot(kind=”hist”, bins=20) + plt.show()
Directly on df: Bar plot with title?
df.plot(kind=”bar”, title=”irgendwas”) + plt.show()
Directly on df: How to make a line chart with x-axis based on a column and y-axis based on a different column?
df.plot(kind=”line”, x=”col1”, y=”col2”) + plt.show()
Direct on df: How to show a histogram?
df.plot(kind=”hist”) + plt.show()