Pandas Flashcards
Create a Pandas dataframe with dictionaries
keys as column labels, values as lists of column data, then Df = pd.DataFrame(dict)
e.g.:
countries = {‘Name’: [‘UK’, ‘Germany’],
‘Capital’: [‘London’, ‘Berlin;]}
Countriesdf = pd.DataFrame(countries)
set index labels for the rows of a pandas dataframe
pass a list of values to df.index, e.g.:
Countriesdf.index = [‘UK’, ‘GER’]
import a csv file as a dataframe
Countriesdf = pd.read_csv(‘file/location/countries.csv’)
make the first column of a dataframe the index of the csv (rather than a column in its own right)
Countriesdf = pd.read_csv(‘file/location/countries.csv’, index_col = 0)
Get a Pandas Series from a Dataframe
df[‘column name’]
Get a single column of the Pandas Dataframe
df[[‘column name’]]
Get multiple columns from a Pandas Dataframe
df[[‘column name1’, ‘column name2’]]
Get the 2nd - 4th (inclusive) rows of a Pandas Dataframe
df[1:5]
call a row of a Pandas Dataframe with loc, as a Pandas Series
df.loc[‘row index’]
Call 2 rows of a Pandas Dataframe with loc, as a dataframe
df.loc[[‘row index’, ‘row index 2’]]
call the intersection of 2 rows of a pandas dataframe by name
df[[‘column name1’, ‘column name 2’], [‘row index 1’, ‘row index 2’]]
get a whole column of a dataframe by name
df[ : [‘column name1’, ‘column name 2’]]
Let’s say you have numpy arrays of x = [1,2,3,4,5] and y =[5,4,3,2,1]. How you use operators with them to get an array of bools corresponding to x/y>2 or y-x = 2?
np.logical_or(x/y>2 , y-x ==2)
(this gives the array [False, True, False, False, True], just out of interest.
create a numpy array with a list of numbers 1,2,3,4,5
np.array([1,2,3,4,5])
Let’s say you have a dataframe with a list of days and the total minutes of meditation done on those days. What are the two steps to create a list of days on which more than one hour of meditation was done?
- select the meditation length column as a Pandas Series and do a comparison on that column:
hourPlus = meditationDF[‘duration’] > 60 - apply the results of that comparison to the dataframe:
meditationDF[hourPlus]
Let’s say you have a dataframe with a list of days and the total minutes of meditation done on those days. What are the two steps to create a list of days on which more than one hour of meditation was done BUT less than 2 hours was done?
bt1and2 = np.logical_and(meditationDF[‘duration’] > 60, meditationDF[‘duration’] < 120)
meditationDF[bt1and2]
iterate over the rows of a dataframe and print out every row and label
for label, row in np.iterrows(dataframe):
print(label, ‘\n’, row)
let’s say you’ve got a pandas dataframe with a bunch of columns, and one of them is ‘Day of week’. Iterate over the rows of the dataframe and print out only the value of that column for each row.
for label, row in np.iterrows(dataframe):
print(row[‘Day of week’])
create a new column in a dataframe by applying a function to an existing column
dataframe[“column length”] = dataframe[‘existing column’].apply(len)
create a new column in a dataframe by applying a method to an existing column
dataframe[“NEW COLUMN” = dataframe[‘existing column’].apply(str.upper)
get some summary statistics for a dataframe
dataframe.describe()
get all values a dataframe as a 2d numpy array
pd.DataFrame(df).to_numpy()
get column names of a dataframe
dataframe.columns
get the row numbers or names of a dataframe
dataframe.index