3.3 Built-In Pandas Functions that work on Dataframes Flashcards
What built-in pandas function lets you take the mean of numeric columns in a dataframe?
.mean()
pg[[‘shot’, ‘goal’, ‘assist’]].mean()
–> gives the mean of each column
What built-in pandas function lets you take the max of numeric columns in a dataframe?
.max()
pg[[‘shot’, ‘goal’, ‘assist’]].max()
–> gives the max of each column
What does max() do that mean() doesn’t?
Max() can also operate on string columns as it treats the “max” as the latest in the alphabet
What is important to remember about axis when using pandas built-in functions?
Whether you want to calculate the statistic on the columns or the rows:
columns (the default, axis=0),
rows (axis=1)
Say you want to calculate the mean number of goals by a player, and the player data is spread out per match (7 columns for goals, 1 column representative per match), what should the axis argument equal?
axis=1, as the mean should be calculated across the rows, and NOT columns!
Using axis=0 would give the mean goals scored per game by ALL players
What do 0 and 1 mean when using built-in summary stats on boolean columns?
0 = False
1 = True
What are the two boolean specific summary functions (Pandas)?
.any()
.all()
What does the .any() boolean specific function do?
eg. Can be used to check is any player scored above 100 passes
(pg[‘pass’] > 100).any()
What does the .all() boolean specific function do?
eg. Can be used to check is ALL players made at least one pass
(pg[‘pass’] > 0).all()
Do the boolean specific functions “any()” and “all()” take an argument?
Yes, they take axis arguments.
Could set the axis to be 1, this will go through each row (player) and check if they won more than 5 air duels
Which function returns a summary of the frequency of individual values?
.value_counts()
What argument can be used in addition to the “.value_counts()” function which ensures the values will add up to 1 and represent propotions?
(normalize= True)
pg[‘team’].value_counts(normalize=True)
–> this grabs the column of teams and outputs a % for the frequency of a team name appearing in the datalist.
What does the “crosstab” function do?
Shows the frequencies for ALL the combinations of the two columns
pd.crosstab(pg[‘team’], pg[‘pos’])
–> this outputs a frame showing how often a country (team) is present in the datafile in combination with positions
(eg. ‘romelu lukaku’, ‘belgium’, ‘FWD’)
–> This is 1 instance of the position ‘FWD’ and ‘Belgium’ being combined
Which pandas function is used to change the dataframe format from wide to long?
melt() and unmelt()
“Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.”
Which pandas function allows you to restructure a dataframe by turning rows into columns?
pivot()
“Return reshaped DataFrame organized by given index / column values.”