3.3 Built-In Pandas Functions that work on Dataframes Flashcards

1
Q

What built-in pandas function lets you take the mean of numeric columns in a dataframe?

A

.mean()

pg[[‘shot’, ‘goal’, ‘assist’]].mean()
–> gives the mean of each column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What built-in pandas function lets you take the max of numeric columns in a dataframe?

A

.max()

pg[[‘shot’, ‘goal’, ‘assist’]].max()
–> gives the max of each column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does max() do that mean() doesn’t?

A

Max() can also operate on string columns as it treats the “max” as the latest in the alphabet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is important to remember about axis when using pandas built-in functions?

A

Whether you want to calculate the statistic on the columns or the rows:

columns (the default, axis=0),
rows (axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Say you want to calculate the mean number of goals by a player, and the player data is spread out per match (7 columns for goals, 1 column representative per match), what should the axis argument equal?

A

axis=1, as the mean should be calculated across the rows, and NOT columns!

Using axis=0 would give the mean goals scored per game by ALL players

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do 0 and 1 mean when using built-in summary stats on boolean columns?

A

0 = False
1 = True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two boolean specific summary functions (Pandas)?

A

.any()

.all()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the .any() boolean specific function do?

A

eg. Can be used to check is any player scored above 100 passes

(pg[‘pass’] > 100).any()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the .all() boolean specific function do?

A

eg. Can be used to check is ALL players made at least one pass

(pg[‘pass’] > 0).all()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Do the boolean specific functions “any()” and “all()” take an argument?

A

Yes, they take axis arguments.

Could set the axis to be 1, this will go through each row (player) and check if they won more than 5 air duels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which function returns a summary of the frequency of individual values?

A

.value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What argument can be used in addition to the “.value_counts()” function which ensures the values will add up to 1 and represent propotions?

A

(normalize= True)

pg[‘team’].value_counts(normalize=True)

–> this grabs the column of teams and outputs a % for the frequency of a team name appearing in the datalist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the “crosstab” function do?

A

Shows the frequencies for ALL the combinations of the two columns

pd.crosstab(pg[‘team’], pg[‘pos’])

–> this outputs a frame showing how often a country (team) is present in the datafile in combination with positions

(eg. ‘romelu lukaku’, ‘belgium’, ‘FWD’)
–> This is 1 instance of the position ‘FWD’ and ‘Belgium’ being combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which pandas function is used to change the dataframe format from wide to long?

A

melt() and unmelt()

“Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which pandas function allows you to restructure a dataframe by turning rows into columns?

A

pivot()

“Return reshaped DataFrame organized by given index / column values.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which pandas function allows you to combine DataFrames or named Series into a singular DataFrame?

A

.merge()

“Merge DataFrame or named Series objects with a database-style join.”

17
Q

Which pandas function attempts to infer the datetime format?

A

.to_datetime()

“Pandas to_datetime will attempt to infer the datetime format based on the input, which can make parsing faster.”

18
Q
A