3.2 Modify or Create new columns of Data Flashcards

1
Q

How do you create a column?

A

pg[‘yellow_cards’] = 1

where the datafile is loaded in as “pg”, and “yellow_cards” is the column that we are creating and 1 is the assigned value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three main column types?

A

Number, String and Boolean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How would you calculate a shot-goal percentage from a datafile?

A

pg[‘shot_pct’] = 100*pg[‘goal’]/pg[‘shot’]

–> This creates a new column called “shot_pct”, which is calculated by dividing the column “goal” by “shot”, and multiplying this by 100 (for a percentage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which python library is more “raw” and “math” oriented?

A

Numpy (usually imported as “np”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you output a number of random rows from your DataFrame?

A

use sample()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you manipulate a column into becoming a string?

A

call .str on a column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you concatenate string columns together?

A

Use ‘+’!

(pg[‘name’] + “, “ + pg[‘pos’]).sample(5)

–> prints a string of “name, pos”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How could you use booleans to check if a player is a defender?

A

pg[‘is_defender’] = (pg[‘pos’] == ‘DEF’)

–> returns TRUE when he is a defender, FALSE when the player is NOT a defender

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should you add to a string if it has backslashes (as that may cause an error…)?

A

r”windows path here”

an r preceding a string converts it to a raw string and ignores any of the backlashes within the string, useful for when using windows paths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is “ | “ used for in python?

A

Short way of calling the __ or __ method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you reverse True and False? (eg. you want true values to be false?)

A

add a ~ (twisty line) in front, so:

pg[‘is_a_mid_or_fwd’] = ~((pg[‘pos’] == ‘MID’) |

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you have your column data go through a function?

A

create a function (eg. “is_south_america”)

pg[‘is_SA’] = pg[‘team’].apply(is_south_america)

So use “.apply(function)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you rename a column?

A

pg.rename(columns={‘min’: ‘minutes’}, inplace=True)

call the rename method and pass it through a dictionary, for example renaming ‘min’ to ‘minutes’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are missing values represented by in Pandas?

A

np.nan

(because numpy is the library that pandas is built upon, and nan stands for “not a number”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What functions can be used to determine whether values are missing?

A

isnull and notnull

–> These return a column of booleans indicating whether the column is or is not missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What method can be used to replace all missing values with a value of choice?

A

.fillna(any_value_here)

17
Q

If a date is expressed as a string [‘20180618’], so 18 june 2018, how would you print exclusively the month?

A

pg[‘month’] = pg[‘date’].astype(str).str[4:6]

–> This creates a new column called ‘month’, which looks at the date string and selects only values 4:6.

18
Q

What can .astype() be used for?

A

The astype() method returns a new DataFrame where the data types has been changed to the specified type.

19
Q

How can you drop a column?

A

pg.drop(‘name’, axis = 1, inplace=True)

This drops a column from a dataframe, the “axis = 1” is necessary because the default behaviour of drop is to operate on rows. Its much easier to pass axis=1 so that it’ll drop the name of the column you provide instead.

20
Q
A