Pandas Flashcards

1
Q

Axis?

A

Axis 0 = rows

Axis 1 = columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reorder columns in dataframe
Option 1)
Option 2)

A

1) df.sort_index(axis=1)

2) df.reindex(columns=sorted(df.columms))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explore document
1) first rows
2)
3)

A

Df.head()
Df.info()
Df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reorder dataframe

1) by index
2) by a particular column

A

1) df.sort_index()

2) df.sort_values(by=column)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Remove columns from dataframe

A

Df.drop([col1, col2], axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Filter dataframe by a value in a column?

A

Df[df[‘column’]<5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Filter dataframe by several conditions

A

Df[df[‘column’] >0 and df[‘column] == ‘Berlin’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Filter column in df by list of values

A

List = [‘one’, ‘two’]

Df[df[‘col’].isin(list)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Filter column values by contains

A

Df[df[‘col’].str.contains(pattern)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unique values in columm

A

Df[‘col’].unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Filter column values by does not contain?

A

Df[df[‘col’].str.contains(‘blabla’)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Join dataframes

1) merge
2) join

A

1)
Pd.merge(left, right, how=’inner’,on=None,left_on=None, right_on=None
…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Concatenate datasets

A

Pd.concat([df1,df2],axis=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Create new columns based on existing columns?

1) 2 conditions
2) 2 conditions

A

1) ifelse
Np.where(condition, ‘yes’, ‘no’)

2) ternary expression
Df[‘col’] = df[‘number’].apply(lambda x:
‘more than 5’ if x > 5 else ‘5 or less’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Create column based on existing column

1) one column, 3+ conditioma

A

Df[‘ncol’] = df[‘oldcol’].apply(function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Create new column based on multiple columns

A

Df[‘newcol’] = df.apply(lambda row: function(row), axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Convert x to string
1)
2)

A

.astype(str)

Str()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Check missing values in column

A

Isnull().any()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Remove rows with missing values

A

.dropna()

20
Q

Fill missing values

A

.fillna(0)

21
Q

Filter column values by regex

A

Df[‘col’].str.contains(pattern,regex=True)

22
Q

Filter groups where the max. value of cumsum_cpo > 0

A

test_converting = test.groupby(‘campaign_id’).filter(lambda g: g[‘cumsum_cpo’].max() != 0)

23
Q

Cumsum by groups

A

facebook_costs[‘cumsum’] = facebook_costs.groupby(‘campaign_id’)[‘spend’].cumsum()

24
Q

select index by condittion and convert to list

A

df.index[conversions_sitelinks[‘conversions’].str.len() > 7].tolist()

25
Q

Find maximum values per group

A

result = df.groupby(‘A’).apply(lambda g: g[g[‘B’] == g[‘B’].max()])

26
Q

Count of values

A

Value_counts()

27
Q

Count of values, relative %

A

Value_counts(normalize=True)[value]

df[‘col’].Value_counts(normalize=True)

28
Q

group by and aggregate

A

data[data[‘item’] == ‘call’].groupby(‘month’).agg(
# Get max of the duration column for each group
max_duration=(‘duration’, max),
# Get min of the duration column for each group
min_duration=(‘duration’, min),
# Get sum of the duration column for each group
total_duration=(‘duration’, sum),
# Apply a lambda to date column
num_days=(“date”, lambda x: (max(x) - min(x)).days)

29
Q

Correlation two columns

A

df[‘col1’].corr(df[‘col2’])

30
Q

Percent change to previous row

A
  • datetime as index

pct_change()

31
Q

Iterate through two lists at once

A
>>> letters = ['a', 'b', 'c']
>>> numbers = [0, 1, 2]
>>> for l, n in zip(letters, numbers):
...     print(f'Letter: {l}')
...     print(f'Number: {n}')
32
Q

Create new column with percentage of total from another column

A

df[‘col_pct’] = df.col / df.col.sum()

33
Q

Print dimensions of df

A

print(f’df:{df.shape}’)

34
Q

Number of unique values in a column

A

len(set(df.col))

35
Q

Correlation with all other columns in df

A

result = df.corr()[[‘col’]]

36
Q

Join dataframes (merge)

1) index
2) other columns

A
result = df1.merge(df2, on='index', how='left')
result = df1.merge(df2, left_on='col1', right_on='col2', how='left')
37
Q

Display information to a function

A

?name

38
Q

display version of pandas

A

pd. __version__

39
Q

Set max. number of rows

A

pd. set_option(‘display.max_rows’, 500)

pd. options.display.

40
Q

Remove duplicates from list:

1) list comprehension
2) set

A

1) res = []
[res.append(x) for x in test_list if x not in res]
2) list(set(test_list))

41
Q

Unique values in a column

A

set(df.col)

42
Q

rename column in dataframe

A

df.rename(columns={‘two’:’new_name’}, inplace=True)

43
Q

replace string in a column with another string

A

df[‘newcol’] = df[‘oldcol’].str.replace(‘str’,’another_str’)

44
Q

stack two dataframes

A

transactions = pd.concat([transactions_1, transactions_2],axis=0)

45
Q

new column: sum over groups

A

final_new[‘sum’] = final_new.groupby(‘created’)[‘count_transactions’].transform(“sum”)