pandas Flashcards
what is loc?
it selects rows by values. for example: elections.loc[0:4] returns the first 4 rows in the election table, as they are rows 0 to rows 4 end inclusive.
what is the tail function?
returns the last n of table: elections.tail(2) returns the last 2 rows in the elections table.
how can you use loc for column names?
example: elections.loc[0:4, “year” : “party”]
what is iloc?
selects values by number. example: elections.iloc[[1, 2, 3][0,1,2]] returns the 2nd 3rd and 4th columns then the 1st 2nd and 3rd columns
how do loc and iloc differ?
loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.
how do we get a datafram out of our loc or iloc function?
use to frame:
elections[“candidate”].tail(5).to_frame()
what is isin?
isin helps us find the values we are looking for in the data frame:
wanted = [“Anti-Masonic”, “American”, “Anti-Monopoly”, “American Independent”]
elections[elections[“Party”].isin(wanted)]
what is str.strartswith?
str.startswith helps us find the values in a specific column that start with the specified letter:
elections[elections[“Party”].str.startswith(“A”)]
how do we use query?
query lets you choose when values equal or are more/less than speific numbers, or when values = win, i.e. a specific value.
elections.query(‘Year >= 2000 and Result == “win”’)
what does .describe() return?
std, mean, etc.
what is value_counts()?
value_counts() retunrs the # of occurences of unique values;
what is unique()?
returns every unique value in the specified column:
elections[‘party’].unique()
what is sort_values()?
this function sorts values in word columns A-Z without specification, can sort numbers as well, like in column ‘%’:
elections.sort_values(“%”, ascending = False)
what is str.len()?
a funcion that returns the length of the values in a specified column.
babyname_lengths = babynames[“Name”].str.len()
how do we drop a column?
babynames = babynames.drop(“name_lengths”, axis = ‘columns’)
HIGH LEVEL : in babynames df, sort by the number of occurrences of “dr” plus the number of occurences of “ea”.
create the temporary column
def dr_ea_count(string):
return string.count(‘dr’) + string.count(‘ea’)
babynames[“dr_ea_count”] = babynames[“Name”].map(dr_ea_count)
babynames = babynames.sort_values(by = “dr_ea_count”, ascending=False)
babynames.head()
use groupby; Try to create a groupby.agg call that gives the total babies born with each name.
puzzle1 = female_babynames.groupby(“Name”)[[“Count”]].agg(sum)
how do we rename columns?
rtp_table = rtp_table.rename(columns = {“Count”: “Count RTP”})
make the babynames pivot table;
babynames_pivot = babynames.pivot_table(
index=
columns=’
values=
aggfunc=
)
babynames_pivot.head(6)
babynames_pivot = babynames.pivot_table(
index=’Year’, # the rows (turned into index)
columns=’Sex’, # the column values
values=[‘Count’], # the field(s) to processed in each group
aggfunc=np.sum, # group operation
)
babynames_pivot.head(6)
how do we merge two tables that have name columns in common?
merged = pd.merge(left = elections, right = male_2020_babynames,
left_on = “First Name”, right_on = “Name”)
merged