Pandas Flashcards by Ernesto Almeida

Drop Columns

df.drop(columns=[‘Column1’, ‘Column2’])

How well did you know this?

Not at all

Perfectly

Pivots

df.pivot(columns=’var’, values=’val’)

How well did you know this?

Not at all

Perfectly

Sort

df.sort_values(‘column1’)

Order rows by values of a column (low to high).

How well did you know this?

Not at all

Perfectly

Rename Columns

df.rename(columns = {‘y’:’year’})

Rename the columns of a DataFrame

How well did you know this?

Not at all

Perfectly

Head

df.head(n)

Select first n rows

How well did you know this?

Not at all

Perfectly

Tail

df.tail(n)

Select last n rows

How well did you know this?

Not at all

Perfectly

Using Query

query() allows Boolean expressions for filtering
rows.
df.query('Length > 7')
df.query('Length > 7 and Width < 8')
df.query('Name.str.startswith("abc")',
engine="python")

How well did you know this?

Not at all

Perfectly

Select rows 10-20.

df.iloc[10:20]

How well did you know this?

Not at all

Perfectly

Select columns in positions 1, 2 and 5 (first

column is 0).

df.iloc[:, [1, 2, 5]]

How well did you know this?

Not at all

Perfectly

Access single value by index

df.iat[1, 2]

How well did you know this?

Not at all

Perfectly

Access single value by label

df.at[4, ‘A’]

How well did you know this?

Not at all

Perfectly

Select rows meeting logical condition, and only the specific columns .

df.loc[df[‘a’] > 10, [‘a’, ‘c’]]

How well did you know this?

Not at all

Perfectly

Append rows of DataFrames

pd.concat([df1,df2])

How well did you know this?

Not at all

Perfectly

Append columns of DataFrames

pd.concat([df1,df2], axis=1)

How well did you know this?

Not at all

Perfectly

Gather columns into rows.

pd.melt(df)

How well did you know this?

Not at all

Perfectly

Logic in Python (and pandas)

< Less than 
!= Not equal to
> Greater than 
df.column.isin(values)  Group membership
== Equals 
pd.isnull(obj) Is NaN
<= Less than or equals pd.notnull(obj) Is not NaN
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all

How well did you know this?

Not at all

Perfectly

Count number of rows with each unique value of variable

Study These Flashcards

df[‘w’].value_counts()

Tuple of # of rows, # of columns in DataFrame.

Study These Flashcards

df.shape

of distinct values in a column.

Study These Flashcards

df[‘w’].nunique()

Basic descriptive and statistics for each column (or GroupBy).

Study These Flashcards

df.describe()

Drop rows with any column having NA/null data.

Study These Flashcards

df.dropna()

Replace all NA/null data with value

Study These Flashcards

df.fillna(value)

Compute and append one or more new columns.

Study These Flashcards

df.assign(Area=lambda df: df.Length*df.Height)

Return a GroupBy object, grouped

by values in column named “col”

Study These Flashcards

df.groupby(by=”col”)

size()
Size of each group.
agg(function)
Aggregate group using function

Histogram for each column

df.plot.hist()

Scatter chart using pairs of points

df.plot.scatter(x='w',y='h')

Merge: | Join matching rows from bdf to adf.

pd.merge(adf, bdf, | how='left', on='x1')

Merge: | Join matching rows from adf to bdf. (Right join)

pd.merge(adf, bdf, | how='right', on='x1')

Merge: | Join data. Retain only rows in both sets.

pd.merge(adf, bdf, | how='inner', on='x1')

Merge: | Join data. Retain all values, all rows.

pd.merge(adf, bdf, | how='outer', on='x1')

Filtering Joins: All rows in adf that have a match in bdf.

adf[adf.x1.isin(bdf.x1)]

Filtering Joins: All rows in adf that do not have a match in bdf.

adf[~adf.x1.isin(bdf.x1)]

Set-like Operations:

pd.merge(ydf, zdf) Rows that appear in both ydf and zdf (Intersection). pd.merge(ydf, zdf, how='outer') Rows that appear in either or both ydf and zdf (Union). ``` pd.merge(ydf, zdf, how='outer', indicator=True) .query('_merge == "left_only"') .drop(columns=['_merge']) Rows that appear in ydf but not zdf (Setdiff). ```

Creating Dataframes: Specify values for each column.

``` df = pd.DataFrame( {"a" : [4, 5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = [1, 2, 3]) ```

Creating Dataframes: Specify values for each row.

``` df = pd.DataFrame( [[4, 7, 10], [5, 8, 11], [6, 9, 12]], index=[1, 2, 3], columns=['a', 'b', 'c']) ```

Creating Dataframes: Create DataFrame with a MultiIndex

``` df = pd.DataFrame( {"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = pd.MultiIndex.from_tuples( [('d’, 1), ('d’, 2), ('e’, 2)], names=['n’, 'v'])) ```

Read csv with Pandas

pd.read_csv(“ruta.csv”, index =FALSE)

Alternative way of creatina a Pivot table

pd.pivot_table(df, values= 0, index=[‘col 1’], columns=[‘col2’], aggfunc =np.sum)

Save df as csv

df.to_csv(“filename”, index=False)

Save as XLSX

with pd.ExcelWriter(“file_name”) as writer: df. to_excel(writer,sheet_name=“name”,index =false) df2. to_excel(writer, sheet_name=“name2”, index=false)

Pandas Flashcards

(40 cards)