Pandas Flashcards
Select multiple columns of pandas dataframe?
df[[‘column 1’, ‘column 2’]]
What are the three ways to index data in pandas?
df[ ]
df. loc[ ]
df. iloc[ ]
How do you select one column from a dataframe as a series?
df[‘food’]
How do you selection one column from a dataframe as a dataframe?
df[[‘food’]]
How do you select multiple columns from a dataframe?
df[[‘color’, ‘food’, ‘score’]]
Can you change the column order when selecting columns?
Yes
When selecting a column from a dataframe as a series what happens to the column label?
Becomes the name of the series
How do you select rows and columns using .loc?
df.loc[row_selection, column_selection]
How do you select multiple rows and columns using .loc?
df.loc[[‘Dean’, ‘Cornelia’], [‘age’, ‘state’, ‘score’]]
Do you need apotosphes when using list names for selecting rows / columns?
no
Does .loc include the last item?
Yes
What is iloc index on?
Integer index location
What does df.iloc[3] find?
The 4th row
To select multiple rows using integers .iloc what do you have to use?
A list df.iloc[[5, 2, 4]]
How do you slice rows using .iloc?
df.iloc[3:5] (no double bracket required)
Selecting rows and columns using iloc and integreers?
f.iloc[[2,3], [0, 4]]
How do you select rows using a slice and columns using integers using iloc?
df.iloc[3:6, [1, 4]]
What should you use the indexing operator for?
Columns 1) A string - returns a series 2) A list of strings - returns a dataframe Rows 3) A slice 4) Booleans
Can you use the indexing operator to select both rows and columns?
No
Can you use the indexing operator to select rows?
Yes but don’t
How do you set the index after reading in the csv
df.set_index()
What can you use dot notation for?
Selecting a single column
What 2 methods can you use for boolean selection?
[] and .loc
What method can you use to test multiple conditions in the same column?
isin
What method can find all missing values in a column?
isnull
What operators are used in pandas?
And (&), or (|), and not (~)
When should you use [] and .loc when using boolean queries
[] for just rows, .loc when both rows and columns
When slicing using .loc do you need to put the slice in []?
No
What is the difference between df[2,:] and df[[2],:]
The second returns a dataframe
When grouping data, how can you sort by value?
df.sort_values()
When grouping data, how can you sort by index?
df.sort_index()
When grouping data and sorting by value, how do you sort by decreasing value?
df.sort_values(by = ‘col name’, ascending =False)
How do you sort by more than one column at a time?
df.sort_values(by=[‘col1’ , ‘col2’ ])
How do you use more than one calculation on a column?
df.groupby(‘name’).col.agg([min,max])
How do you sort by more than one column?
df.sort_values(by = [‘col1’,’col2’])
What does size() do?
For Series, returns the number of rows
What is the difference between size and count?
size includes NAN, count does not
What is the format for dtype?
No brackets at the end
How do you change the type of data?
astype()