Pandas Flashcards
Drop Columns
df.drop(columns=[‘Column1’, ‘Column2’])
Pivots
df.pivot(columns=’var’, values=’val’)
Sort
df.sort_values(‘column1’)
Order rows by values of a column (low to high).
Rename Columns
df.rename(columns = {‘y’:’year’})
Rename the columns of a DataFrame
Head
df.head(n)
Select first n rows
Tail
df.tail(n)
Select last n rows
Using Query
query() allows Boolean expressions for filtering rows. df.query('Length > 7') df.query('Length > 7 and Width < 8') df.query('Name.str.startswith("abc")', engine="python")
Select rows 10-20.
df.iloc[10:20]
Select columns in positions 1, 2 and 5 (first
column is 0).
df.iloc[:, [1, 2, 5]]
Access single value by index
df.iat[1, 2]
Access single value by label
df.at[4, ‘A’]
Select rows meeting logical condition, and only the specific columns .
df.loc[df[‘a’] > 10, [‘a’, ‘c’]]
Append rows of DataFrames
pd.concat([df1,df2])
Append columns of DataFrames
pd.concat([df1,df2], axis=1)
Gather columns into rows.
pd.melt(df)
Logic in Python (and pandas)
< Less than != Not equal to > Greater than df.column.isin(values) Group membership == Equals pd.isnull(obj) Is NaN <= Less than or equals pd.notnull(obj) Is not NaN >= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all
Count number of rows with each unique value of variable
df[‘w’].value_counts()
Tuple of # of rows, # of columns in DataFrame.
df.shape
of distinct values in a column.
df[‘w’].nunique()
Basic descriptive and statistics for each column (or GroupBy).
df.describe()
Drop rows with any column having NA/null data.
df.dropna()
Replace all NA/null data with value
df.fillna(value)
Compute and append one or more new columns.
df.assign(Area=lambda df: df.Length*df.Height)
Return a GroupBy object, grouped
by values in column named “col”
df.groupby(by=”col”)
size()
Size of each group.
agg(function)
Aggregate group using function
Histogram for each column
df.plot.hist()
Scatter chart using pairs of points
df.plot.scatter(x=’w’,y=’h’)
Merge:
Join matching rows from bdf to adf.
pd.merge(adf, bdf,
how=’left’, on=’x1’)
Merge:
Join matching rows from adf to bdf. (Right join)
pd.merge(adf, bdf,
how=’right’, on=’x1’)
Merge:
Join data. Retain only rows in both sets.
pd.merge(adf, bdf,
how=’inner’, on=’x1’)
Merge:
Join data. Retain all values, all rows.
pd.merge(adf, bdf,
how=’outer’, on=’x1’)
Filtering Joins:
All rows in adf that have a match in bdf.
adf[adf.x1.isin(bdf.x1)]
Filtering Joins:
All rows in adf that do not have a match in bdf.
adf[~adf.x1.isin(bdf.x1)]
Set-like Operations:
pd.merge(ydf, zdf)
Rows that appear in both ydf and zdf
(Intersection).
pd.merge(ydf, zdf, how=’outer’)
Rows that appear in either or both ydf and zdf
(Union).
pd.merge(ydf, zdf, how='outer', indicator=True) .query('_merge == "left_only"') .drop(columns=['_merge']) Rows that appear in ydf but not zdf (Setdiff).
Creating Dataframes:
Specify values for each column.
df = pd.DataFrame( {"a" : [4, 5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = [1, 2, 3])
Creating Dataframes:
Specify values for each row.
df = pd.DataFrame( [[4, 7, 10], [5, 8, 11], [6, 9, 12]], index=[1, 2, 3], columns=['a', 'b', 'c'])
Creating Dataframes:
Create DataFrame with a MultiIndex
df = pd.DataFrame( {"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = pd.MultiIndex.from_tuples( [('d’, 1), ('d’, 2), ('e’, 2)], names=['n’, 'v']))
Read csv with Pandas
pd.read_csv(“ruta.csv”, index =FALSE)
Alternative way of creatina a Pivot table
pd.pivot_table(df, values= 0, index=[‘col 1’], columns=[‘col2’], aggfunc =np.sum)
Save df as csv
df.to_csv(“filename”, index=False)
Save as XLSX
with pd.ExcelWriter(“file_name”) as writer:
df. to_excel(writer,sheet_name=“name”,index =false)
df2. to_excel(writer, sheet_name=“name2”, index=false)