Pandas Flashcards
Axis?
Axis 0 = rows
Axis 1 = columns
Reorder columns in dataframe
Option 1)
Option 2)
1) df.sort_index(axis=1)
2) df.reindex(columns=sorted(df.columms))
Explore document
1) first rows
2)
3)
Df.head()
Df.info()
Df.describe()
Reorder dataframe
1) by index
2) by a particular column
1) df.sort_index()
2) df.sort_values(by=column)
Remove columns from dataframe
Df.drop([col1, col2], axis=1)
Filter dataframe by a value in a column?
Df[df[‘column’]<5]
Filter dataframe by several conditions
Df[df[‘column’] >0 and df[‘column] == ‘Berlin’]
Filter column in df by list of values
List = [‘one’, ‘two’]
Df[df[‘col’].isin(list)]
Filter column values by contains
Df[df[‘col’].str.contains(pattern)]
Unique values in columm
Df[‘col’].unique()
Filter column values by does not contain?
Df[df[‘col’].str.contains(‘blabla’)]
Join dataframes
1) merge
2) join
1)
Pd.merge(left, right, how=’inner’,on=None,left_on=None, right_on=None
…)
Concatenate datasets
Pd.concat([df1,df2],axis=0)
Create new columns based on existing columns?
1) 2 conditions
2) 2 conditions
1) ifelse
Np.where(condition, ‘yes’, ‘no’)
2) ternary expression
Df[‘col’] = df[‘number’].apply(lambda x:
‘more than 5’ if x > 5 else ‘5 or less’)
Create column based on existing column
1) one column, 3+ conditioma
Df[‘ncol’] = df[‘oldcol’].apply(function)
Create new column based on multiple columns
Df[‘newcol’] = df.apply(lambda row: function(row), axis=1)
Convert x to string
1)
2)
.astype(str)
Str()
Check missing values in column
Isnull().any()
Remove rows with missing values
.dropna()
Fill missing values
.fillna(0)
Filter column values by regex
Df[‘col’].str.contains(pattern,regex=True)
Filter groups where the max. value of cumsum_cpo > 0
test_converting = test.groupby(‘campaign_id’).filter(lambda g: g[‘cumsum_cpo’].max() != 0)
Cumsum by groups
facebook_costs[‘cumsum’] = facebook_costs.groupby(‘campaign_id’)[‘spend’].cumsum()
select index by condittion and convert to list
df.index[conversions_sitelinks[‘conversions’].str.len() > 7].tolist()
Find maximum values per group
result = df.groupby(‘A’).apply(lambda g: g[g[‘B’] == g[‘B’].max()])
Count of values
Value_counts()
Count of values, relative %
Value_counts(normalize=True)[value]
df[‘col’].Value_counts(normalize=True)
group by and aggregate
data[data[‘item’] == ‘call’].groupby(‘month’).agg(
# Get max of the duration column for each group
max_duration=(‘duration’, max),
# Get min of the duration column for each group
min_duration=(‘duration’, min),
# Get sum of the duration column for each group
total_duration=(‘duration’, sum),
# Apply a lambda to date column
num_days=(“date”, lambda x: (max(x) - min(x)).days)
Correlation two columns
df[‘col1’].corr(df[‘col2’])
Percent change to previous row
- datetime as index
pct_change()
Iterate through two lists at once
>>> letters = ['a', 'b', 'c'] >>> numbers = [0, 1, 2] >>> for l, n in zip(letters, numbers): ... print(f'Letter: {l}') ... print(f'Number: {n}')
Create new column with percentage of total from another column
df[‘col_pct’] = df.col / df.col.sum()
Print dimensions of df
print(f’df:{df.shape}’)
Number of unique values in a column
len(set(df.col))
Correlation with all other columns in df
result = df.corr()[[‘col’]]
Join dataframes (merge)
1) index
2) other columns
result = df1.merge(df2, on='index', how='left') result = df1.merge(df2, left_on='col1', right_on='col2', how='left')
Display information to a function
?name
display version of pandas
pd. __version__
Set max. number of rows
pd. set_option(‘display.max_rows’, 500)
pd. options.display.
Remove duplicates from list:
1) list comprehension
2) set
1) res = []
[res.append(x) for x in test_list if x not in res]
2) list(set(test_list))
Unique values in a column
set(df.col)
rename column in dataframe
df.rename(columns={‘two’:’new_name’}, inplace=True)
replace string in a column with another string
df[‘newcol’] = df[‘oldcol’].str.replace(‘str’,’another_str’)
stack two dataframes
transactions = pd.concat([transactions_1, transactions_2],axis=0)
new column: sum over groups
final_new[‘sum’] = final_new.groupby(‘created’)[‘count_transactions’].transform(“sum”)