Pandas Flashcards by Ludmila Kuncarova

Axis?

Axis 0 = rows

Axis 1 = columns

How well did you know this?

Not at all

Perfectly

Reorder columns in dataframe
Option 1)
Option 2)

1) df.sort_index(axis=1)

2) df.reindex(columns=sorted(df.columms))

How well did you know this?

Not at all

Perfectly

Explore document
1) first rows
2)
3)

Df.head()
Df.info()
Df.describe()

How well did you know this?

Not at all

Perfectly

Reorder dataframe

1) by index
2) by a particular column

1) df.sort_index()

2) df.sort_values(by=column)

How well did you know this?

Not at all

Perfectly

Remove columns from dataframe

Df.drop([col1, col2], axis=1)

How well did you know this?

Not at all

Perfectly

Filter dataframe by a value in a column?

Df[df[‘column’]<5]

How well did you know this?

Not at all

Perfectly

Filter dataframe by several conditions

Df[df[‘column’] >0 and df[‘column] == ‘Berlin’]

How well did you know this?

Not at all

Perfectly

Filter column in df by list of values

List = [‘one’, ‘two’]

Df[df[‘col’].isin(list)]

How well did you know this?

Not at all

Perfectly

Filter column values by contains

Df[df[‘col’].str.contains(pattern)]

How well did you know this?

Not at all

Perfectly

Unique values in columm

Df[‘col’].unique()

How well did you know this?

Not at all

Perfectly

Filter column values by does not contain?

Df[df[‘col’].str.contains(‘blabla’)]

How well did you know this?

Not at all

Perfectly

Join dataframes

1) merge
2) join

1)
Pd.merge(left, right, how=’inner’,on=None,left_on=None, right_on=None
…)

How well did you know this?

Not at all

Perfectly

Concatenate datasets

Pd.concat([df1,df2],axis=0)

How well did you know this?

Not at all

Perfectly

Create new columns based on existing columns?

1) 2 conditions
2) 2 conditions

1) ifelse
Np.where(condition, ‘yes’, ‘no’)

2) ternary expression
Df[‘col’] = df[‘number’].apply(lambda x:
‘more than 5’ if x > 5 else ‘5 or less’)

How well did you know this?

Not at all

Perfectly

Create column based on existing column

1) one column, 3+ conditioma

Df[‘ncol’] = df[‘oldcol’].apply(function)

How well did you know this?

Not at all

Perfectly

Create new column based on multiple columns

Df[‘newcol’] = df.apply(lambda row: function(row), axis=1)

How well did you know this?

Not at all

Perfectly

Convert x to string
1)
2)

.astype(str)

Str()

How well did you know this?

Not at all

Perfectly

Check missing values in column

Isnull().any()

How well did you know this?

Not at all

Perfectly

Remove rows with missing values

Study These Flashcards

.dropna()

Fill missing values

Study These Flashcards

.fillna(0)

Filter column values by regex

Study These Flashcards

Df[‘col’].str.contains(pattern,regex=True)

Filter groups where the max. value of cumsum_cpo > 0

Study These Flashcards

test_converting = test.groupby(‘campaign_id’).filter(lambda g: g[‘cumsum_cpo’].max() != 0)

Cumsum by groups

Study These Flashcards

facebook_costs[‘cumsum’] = facebook_costs.groupby(‘campaign_id’)[‘spend’].cumsum()

select index by condittion and convert to list

Study These Flashcards

df.index[conversions_sitelinks[‘conversions’].str.len() > 7].tolist()

Find maximum values per group

result = df.groupby('A').apply(lambda g: g[g['B'] == g['B'].max()])

Count of values

Value_counts()

Count of values, relative %

Value_counts(normalize=True)[value] | df['col'].Value_counts(normalize=True)

group by and aggregate

data[data['item'] == 'call'].groupby('month').agg( # Get max of the duration column for each group max_duration=('duration', max), # Get min of the duration column for each group min_duration=('duration', min), # Get sum of the duration column for each group total_duration=('duration', sum), # Apply a lambda to date column num_days=("date", lambda x: (max(x) - min(x)).days)

Correlation two columns

df['col1'].corr(df['col2'])

Percent change to previous row

- datetime as index | pct_change()

Iterate through two lists at once

``` >>> letters = ['a', 'b', 'c'] >>> numbers = [0, 1, 2] >>> for l, n in zip(letters, numbers): ... print(f'Letter: {l}') ... print(f'Number: {n}') ```

Create new column with percentage of total from another column

df['col_pct'] = df.col / df.col.sum()

Print dimensions of df

print(f'df:{df.shape}')

Number of unique values in a column

len(set(df.col))

Correlation with all other columns in df

result = df.corr()[['col']]

Join dataframes (merge) 1) index 2) other columns

``` result = df1.merge(df2, on='index', how='left') result = df1.merge(df2, left_on='col1', right_on='col2', how='left') ```

Display information to a function

?name

display version of pandas

pd. __version__

Set max. number of rows

pd. set_option('display.max_rows', 500) | pd. options.display.

Remove duplicates from list: 1) list comprehension 2) set

1) res = [] [res.append(x) for x in test_list if x not in res] 2) list(set(test_list))

Unique values in a column

set(df.col)

rename column in dataframe

df.rename(columns={'two':'new_name'}, inplace=True)

replace string in a column with another string

df['newcol'] = df['oldcol'].str.replace('str','another_str')

stack two dataframes

transactions = pd.concat([transactions_1, transactions_2],axis=0)

new column: sum over groups

final_new['sum'] = final_new.groupby('created')['count_transactions'].transform("sum")

Pandas Flashcards

(45 cards)