Pandas Flashcards
PD: In normal Pandas style, what is a row and what is a column?
A row is an item or point, and a column is a field or feature.
PD: Why do most pandas methods return a df rather than editing the original df in place?
It allows for method chaining, or running several methods in a nest, which improves readability of code.
PD: Do most pandas methods edit a df in place, or return a new df?
Return a new df
PD: How do you read a table or tables from a web page?
pd.read_html(‘sampleurl.com’,…)
PD: How do you read a table from a .csv?
pd.read_csv(\path, sep=” “,…)
PD: How to combine the rows of 2 dfs with the same column headers?
pd.concat([df1,df2])
PD: How to view the first 5 rows?
df.head()
PD: How to view the last 5 rows?
df.tail()
PD: How do we remove columns that aren’t useful to us from our dataframe?
to_drop = [“all”,”col”,”names”,”to”,”drop”]
df.drop(columns=to_drop, inplace=True)
(could remove inplace and do a df=df.drop())
PD: What are the two ways to exclude columns that aren’t useful to us
Drop them using df.drop(), or don’t import then from the csv using inputs to pd.read_csv()
PD: If I go df[“a”], what data structure comes out?
A pandas series
PD: How to I tell if column “A” in a dataframe contains all unique values?
df[“A”].is_unique
PD: How do I set a new column, such as column “A”, to be the index of a dataframe
df = df.set_index[“A”]
If you want a unique index for row names, first check that the column contains unique values using df[“A”].is_unique
PD: What is a convenient way to standardize a column containing multiple formats or data types?
Regular expressions, or regex.
See the Pythonic Data Cleaning article for an example: https://realpython.com/python-data-cleaning-numpy-pandas/
PD: How to find a series containing booleans for whether each value in column “A” is null or not in a dataframe?
df[“A”].isnull()
PD: How to find the total number of nulls in column “A” in a dataframe?
df[“A”].isnull().sum()
PD: How to rename cols in a dataframe?
df.rename(columns=listOfNewColNames, inplace=True)
PD: What does it mean to do a split-apply-combine in pandas?
Say the rows of our dataframe fall into a few different “categories,” and we’d like to find a summary statistic or statistics within each of the categories. We can separate all of the categories into groups (split), then apply one or more summary functions to each group (apply), and lastly combine the summary statistics for each group into a new data frame where each row is now one of the categories (combine).
PD: How can we, in one line of code, do all 3 parts of split-apply-combine?
df.groupby(“colNameToGroupBy”)[“colNameToFindSummaryStatsOn”] .agg([“list”,”of”,”summary”,”stat”,”functions”])
Video for reference: https://www.youtube.com/watch?v=qy0fDqoMJx8