Pandas Flashcards

1
Q

Filter df for when Column is null

A

df[ df.Column.isnull() ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Filter df for when Column is not null

A

df[ df.Column.notna() ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Create a boolean series for when colA >100 AND colB <0

A

(df.colA >100 & df.colB <0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Create a boolean series for when colA >100 OR colB <0

A

(df.colA >100 | df.colB <0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Return a dataframe’s data types

A

df.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Return the dimensions of a dataframe

A

df.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Rename column MANUfacturer as ‘manufacturer’

A

df.rename( columns= {‘MANUfacturer’ : ‘manufacturer’}, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Convert a string column to a float

A

df[‘column’] = df[‘column’].astype(float)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Extract first prefix when string column is split by a dash

A

df.column.str.split(‘-‘).str[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Replace values in Column using a mapping dictionary

A

df.column = df.column.map( {‘Key’ : ‘newkey’, ‘Key1’ : ‘newkey1’ } )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Export dataframe to csv file without index values

A

df.to_csv(‘filename.csv’, index=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Get meta-data information for the columns of a dataframe

A

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Get the name of the columns in a dataframe

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Get descriptive statistics for a column

A

df.column.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Get frequencies for each unique value in a column

A

df.column.value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Get the averages of col_B grouped by col_A

A

df.groupby(df.col_A).col_B.mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Apply the size, min, and max functions to the dataframe grouped by col_A

A

df.groupby(df.col_A).agg( [‘size’, min, max] )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Create a pivot table where col_V1 is sumed and col_V2 shows the min and max. Have col_1 and col_2 as rows and col_A and col_B as columns. Include grand totals.

A
df.pivot_table(
     values=['col_V1', col_V2'],
     index=['col_1', 'col_2'],
     columns=['col_A', 'col_B'],
    aggfunc={
          'col_v1': sum, 
          'col_V2': [min, max] },
     margins=True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

List the index (aka row labels) of a dataframe

A

df.index

20
Q

Convert a dataframe (or series) into a numpy array

A

df.to_numpy()

21
Q

Assign colA and colB as the index (multiIndex) of the dataframe

A

df.set_index( [‘colA’, ‘colB’], inplace=True)

22
Q

Vertically append two dataframes and assign an additional index indicating which df the row came from

A

pd.concat( [df1,df2], keys=[1, 2])

23
Q

Do an inner join on the indexes two dataframes and add a suffix to duplicated column names

A

df1.merge(df2,
left_index=True,
right_index=True,
suffixes= (‘_df1’,’_df2’))

24
Q

Apply a function element-wise to a series

A

df. col_name.apply(function_name)
- - OR –
df. col_name.map(function_name)

25
Q

Apply a function element-wise to a dataframe

A

df.applymap(function_name)

26
Q

Apply a function along the columns of a dataframe

A

df.apply(function_name)

27
Q

Unpivot a dataframe and rename the variables as ID and the values as ‘fact’

A

df.melt( id_vars=[col1, col2],
value_vars=[col3, col4], #defaults to all non-id_vars
var_name= ‘ID’,
value_name = ‘fact’)

28
Q

Return a boolean mask if a regex pattern is found in a certain column

A

df[col_name].str.contains(pattern)

29
Q

Extract a regex capture group from a column

A

df[col_name].str.extract(pattern)

30
Q

Extract more than one group of patterns from a column

A

df[col_name].str.extract(pattern_with_multiple_capture_groups)

31
Q

Replace a regex or string in a column with another string

A

df[col_name].str.replace(pattern, replacement_string)

32
Q

Calculate the number of missing values in each column

A

df.isnull().sum()

33
Q

Drop rows with any missing values

A

df.dropna()

34
Q

Drop specific columns

A

df.drop(columns_to_drop, axis=1)

35
Q

Drop columns with less than a certain number of non-null value

A

df.dropna(thresh = min_nonnull, axis=1)

36
Q

Replace missing values in a column with another value

A

df[col_name].fillna(replacement_value)

37
Q

Show all duplicate rows in a dataframe

A

df[ df.duplicated( keep=False ) ]

38
Q

Drop rows with duplicate values in only certain columns. Keep the last duplicate row

A

df.drop_duplicates( [col_1, col_2], keep=’last’)

39
Q

Replace values of column_A with values of column B when column_A is less than zero

A

df. column_A = df.column_A.mask(
df. column_A < 0,
df. column_b)

40
Q

Resetting the index

A

df.reset_index(inplace=True)

41
Q

Do a left join on a shared column named ‘ID;

A

df1.merge( df2, on=’ID’, how-‘left’)

42
Q

Fill in missing values of a datafram with zeros

A

df.fillna(0, inplace=True)

43
Q

Find the correlations between columns in a dataset

A

df.corr()

44
Q

Convert a column if a dataframe into a list

A

new_list = df.column.tolist()

45
Q

Get all rows of a dataframe where the value of a column is not in the elements of a list

A

df[ ~df.column_name.isna( [list_values] ) ]

46
Q

Sort a dataframe by a col_A descinding and col_B ascending and reset the index

A

df.sort_values( [‘col_A’, ‘col_B’],
ascending=[False, True],
ignore_index=True,
inplace=True)