3.2 Modify or Create new columns of Data Flashcards
How do you create a column?
pg[‘yellow_cards’] = 1
where the datafile is loaded in as “pg”, and “yellow_cards” is the column that we are creating and 1 is the assigned value.
What are the three main column types?
Number, String and Boolean
How would you calculate a shot-goal percentage from a datafile?
pg[‘shot_pct’] = 100*pg[‘goal’]/pg[‘shot’]
–> This creates a new column called “shot_pct”, which is calculated by dividing the column “goal” by “shot”, and multiplying this by 100 (for a percentage)
Which python library is more “raw” and “math” oriented?
Numpy (usually imported as “np”)
How do you output a number of random rows from your DataFrame?
use sample()
How do you manipulate a column into becoming a string?
call .str on a column
How can you concatenate string columns together?
Use ‘+’!
(pg[‘name’] + “, “ + pg[‘pos’]).sample(5)
–> prints a string of “name, pos”
How could you use booleans to check if a player is a defender?
pg[‘is_defender’] = (pg[‘pos’] == ‘DEF’)
–> returns TRUE when he is a defender, FALSE when the player is NOT a defender
What should you add to a string if it has backslashes (as that may cause an error…)?
r”windows path here”
an r preceding a string converts it to a raw string and ignores any of the backlashes within the string, useful for when using windows paths
What is “ | “ used for in python?
Short way of calling the __ or __ method
How do you reverse True and False? (eg. you want true values to be false?)
add a ~ (twisty line) in front, so:
pg[‘is_a_mid_or_fwd’] = ~((pg[‘pos’] == ‘MID’) |
How do you have your column data go through a function?
create a function (eg. “is_south_america”)
pg[‘is_SA’] = pg[‘team’].apply(is_south_america)
So use “.apply(function)”
How can you rename a column?
pg.rename(columns={‘min’: ‘minutes’}, inplace=True)
call the rename method and pass it through a dictionary, for example renaming ‘min’ to ‘minutes’
What are missing values represented by in Pandas?
np.nan
(because numpy is the library that pandas is built upon, and nan stands for “not a number”)
What functions can be used to determine whether values are missing?
isnull and notnull
–> These return a column of booleans indicating whether the column is or is not missing values
What method can be used to replace all missing values with a value of choice?
.fillna(any_value_here)
If a date is expressed as a string [‘20180618’], so 18 june 2018, how would you print exclusively the month?
pg[‘month’] = pg[‘date’].astype(str).str[4:6]
–> This creates a new column called ‘month’, which looks at the date string and selects only values 4:6.
What can .astype() be used for?
The astype() method returns a new DataFrame where the data types has been changed to the specified type.
How can you drop a column?
pg.drop(‘name’, axis = 1, inplace=True)
This drops a column from a dataframe, the “axis = 1” is necessary because the default behaviour of drop is to operate on rows. Its much easier to pass axis=1 so that it’ll drop the name of the column you provide instead.