3.2 Modify or Create new columns of Data Flashcards
How do you create a column?
pg[‘yellow_cards’] = 1
where the datafile is loaded in as “pg”, and “yellow_cards” is the column that we are creating and 1 is the assigned value.
What are the three main column types?
Number, String and Boolean
How would you calculate a shot-goal percentage from a datafile?
pg[‘shot_pct’] = 100*pg[‘goal’]/pg[‘shot’]
–> This creates a new column called “shot_pct”, which is calculated by dividing the column “goal” by “shot”, and multiplying this by 100 (for a percentage)
Which python library is more “raw” and “math” oriented?
Numpy (usually imported as “np”)
How do you output a number of random rows from your DataFrame?
use sample()
How do you manipulate a column into becoming a string?
call .str on a column
How can you concatenate string columns together?
Use ‘+’!
(pg[‘name’] + “, “ + pg[‘pos’]).sample(5)
–> prints a string of “name, pos”
How could you use booleans to check if a player is a defender?
pg[‘is_defender’] = (pg[‘pos’] == ‘DEF’)
–> returns TRUE when he is a defender, FALSE when the player is NOT a defender
What should you add to a string if it has backslashes (as that may cause an error…)?
r”windows path here”
an r preceding a string converts it to a raw string and ignores any of the backlashes within the string, useful for when using windows paths
What is “ | “ used for in python?
Short way of calling the __ or __ method
How do you reverse True and False? (eg. you want true values to be false?)
add a ~ (twisty line) in front, so:
pg[‘is_a_mid_or_fwd’] = ~((pg[‘pos’] == ‘MID’) |
How do you have your column data go through a function?
create a function (eg. “is_south_america”)
pg[‘is_SA’] = pg[‘team’].apply(is_south_america)
So use “.apply(function)”
How can you rename a column?
pg.rename(columns={‘min’: ‘minutes’}, inplace=True)
call the rename method and pass it through a dictionary, for example renaming ‘min’ to ‘minutes’
What are missing values represented by in Pandas?
np.nan
(because numpy is the library that pandas is built upon, and nan stands for “not a number”)
What functions can be used to determine whether values are missing?
isnull and notnull
–> These return a column of booleans indicating whether the column is or is not missing values