Pandas Flashcards
Select multiple columns of pandas dataframe?
df[[‘column 1’, ‘column 2’]]
What are the three ways to index data in pandas?
df[ ]
df. loc[ ]
df. iloc[ ]
How do you select one column from a dataframe as a series?
df[‘food’]
How do you selection one column from a dataframe as a dataframe?
df[[‘food’]]
How do you select multiple columns from a dataframe?
df[[‘color’, ‘food’, ‘score’]]
Can you change the column order when selecting columns?
Yes
When selecting a column from a dataframe as a series what happens to the column label?
Becomes the name of the series
How do you select rows and columns using .loc?
df.loc[row_selection, column_selection]
How do you select multiple rows and columns using .loc?
df.loc[[‘Dean’, ‘Cornelia’], [‘age’, ‘state’, ‘score’]]
Do you need apotosphes when using list names for selecting rows / columns?
no
Does .loc include the last item?
Yes
What is iloc index on?
Integer index location
What does df.iloc[3] find?
The 4th row
To select multiple rows using integers .iloc what do you have to use?
A list df.iloc[[5, 2, 4]]
How do you slice rows using .iloc?
df.iloc[3:5] (no double bracket required)
Selecting rows and columns using iloc and integreers?
f.iloc[[2,3], [0, 4]]
How do you select rows using a slice and columns using integers using iloc?
df.iloc[3:6, [1, 4]]
What should you use the indexing operator for?
Columns 1) A string - returns a series 2) A list of strings - returns a dataframe Rows 3) A slice 4) Booleans
Can you use the indexing operator to select both rows and columns?
No
Can you use the indexing operator to select rows?
Yes but don’t
How do you set the index after reading in the csv
df.set_index()
What can you use dot notation for?
Selecting a single column
What 2 methods can you use for boolean selection?
[] and .loc
What method can you use to test multiple conditions in the same column?
isin
What method can find all missing values in a column?
isnull
What operators are used in pandas?
And (&), or (|), and not (~)
When should you use [] and .loc when using boolean queries
[] for just rows, .loc when both rows and columns
When slicing using .loc do you need to put the slice in []?
No
What is the difference between df[2,:] and df[[2],:]
The second returns a dataframe
When grouping data, how can you sort by value?
df.sort_values()
When grouping data, how can you sort by index?
df.sort_index()
When grouping data and sorting by value, how do you sort by decreasing value?
df.sort_values(by = ‘col name’, ascending =False)
How do you sort by more than one column at a time?
df.sort_values(by=[‘col1’ , ‘col2’ ])
How do you use more than one calculation on a column?
df.groupby(‘name’).col.agg([min,max])
How do you sort by more than one column?
df.sort_values(by = [‘col1’,’col2’])
What does size() do?
For Series, returns the number of rows
What is the difference between size and count?
size includes NAN, count does not
What is the format for dtype?
No brackets at the end
How do you change the type of data?
astype()
How can you add up how many rows of a boolean query are true?
.sum()
How can you count by category?
.value_counts()
What method can you change a column name?
df.rename()
How do you rename column names?
df.rename(columns = {‘original’: ‘new’, ‘original’,’new’}
What do you use when you want to add more rows to data when it has the same column names?
pd.concat([df1,df2])
When using concat do you pass it the df names or a list?
a list
What is shape?
An attribute
shape() or shape
shape
What is dtype
An attribute
dtype() or dtype
dtype
How do you tell the data type of one column?
df[‘col’].dtype
How do you get the count of the number of columns?
df.shape[1]
How do you get the data types of a dataframe?
df.dtypes
How do you work out how many unique instances there are?
df[‘col’].nunique()
How do you get summary statistics for a dataframe?
df.describe()
With the default options, what does describe give you?
Just the numerical columns
What do you need to include in describe to give you a summary of all the columns?
include = all
What is the code to import a CSV?
pd.read_csv(file, sep = ‘x’)
How do you import a tab separated file?
sep = ‘\t’
How do you divide elementwise in a data frame?
.div
What does passing {} to agg do?
Selects a column and does that action to that column
What does unstack do?
Pivot second level index to columns
How do you pivot second level index?
.unstack()
How do you convert data to a time / date?
pd.to_datetime
How do you remove a column of data?
drop
How does drop work?
You have to put the column in brackets rather than selection it before the method
How do you remove blank columns?
dropna but make sure axis = 1
How do you join 2 dataframes together?
pd.concat([frames])
What can you do when joining 2 dataframes together to work out which came from each?
keys = [x,y]
How do you generate random numbers?
np.random.randint
What is an alternative to pd.concat?
pd.append()
How do you identify where the data came from when concatenating?
pd.concat([frames], keys = [‘x’,’y’]
What is an alternative to keys when merging 2 dataframes?
Instead of pd.concat([frames]), pass a dictionary pd.concat({‘x’ : data1, ‘y’: data2})
By default what is the way that pd.concat joins the data?
Adds it to the bottom?
How do you make pd.concat add data as columns?
pd.concat([frames], axis = 1)
What is the default behaviour of merge?
It selects only the rows with the IDs that match between 2 DFs
How do you get full outer join?
pd.merge(df1,df2, on = ‘x’, how = ‘outer’)
When you merge different dataframes how to you tell which came from each column?
By default pandas adds _x / _y but you can change this using suffixes argumnet
What does a right join do?
takes all the entries from the ‘right’ table and returns matching entries from the ‘left’
What are the different options for merging data?
concat, merge, join
What are the defaults for concat, merge and join?
concat = row wise, outer merge = column wise, inner join = column wise, left
How do concat and merge take DF?
concat([frames])
merge(df1,df2)
Why choose merge over concat?
concat has to match the elements along the axis, merge you specify how to match the data
What datatypes can pd.concat take?
SEries or DF, i.e. not numpy arrays
How do you read space separated data?
sep =’ \s+’
How do you read a string as a datetime?
from dateutil.parser import parse
parse(‘January 31, 2010’)
What are the different classes under datetime?
Datetime, date, time, timedelta
In datetime format what is the difference between %Y and %y
% Y is 2020 and %y is 20
How do you convert a date to a string?
strftime
How do you get the weekend
.weekday()
When you pass a dictionary to pd.DataFrame does it retain the original order of columns?
No - alphabetical as dictionaries are inherently unordered
What is the syntax to create a dictionary
{‘col’: [‘x’,’y’,’z’],
‘col2’: [1,2,3]}
How do you order the columns in pd.DataFrame?
pd.DataFrame(data, columns = [])
pd.DataFrame or pd.Dataframe?
pd.DataFrame
How do you set an index when using pd.DataFrame?
pd.DataFrame(data, columns = [], index = [])
You cannot use a column name as an index - it must be the same length. But you can follow this with set_index()
How do you establish if there are any duplicate values in a series?
is_unique (without parentheses)
Can you use is_unique on a df?
No
is_unique or is_unique()
is_unique
Does slicing .loc with label names include the last item or not?
It includes the last item