Pandas_JAS Flashcards

1
Q

How to refer to a single column in a DF

A

DFname[‘columnname’].head() ##head() by default gives first 5 rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Technically, a single column is a ______ not a ______

A

Series; DataFrame

A Series is part of a DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

After typing code into the ______, highlight the code of interest and hit ____ to sent it to the ________

A

Editor
F9
REPL (console)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Normally you to install third party libraries with a tool like _____, but if you’re using ________ it comes with Pandas installed

A

pip

Anaconda Python Bundle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When importing Pandas the convention is to name it ______

A

pd.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To load a csv file:

DFname = pd._______(_______,(DATA_DIR, ‘filename.csv’))

A

read_csv
path.join

DATA_DIR is a variable where you have given the path to your file, e.g., DATA_DIR = ‘/Users/UserName/PythonDirectory’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What result does this give:

type(DFname)

A

pandas.core.frame.DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what method is called to give (by default) the first 5 rows of a DataFrame?

A

head()

DFname.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

head() is a method because you can pass it the number of rows to print, ________ are used without passing any data in parenthesis

A

attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the attribute _______ returns the names of each column in the DF

A

columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the attribute ______ returns the number of rows and columns in the DF

A

shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

______ will turn any Series into a one-column DF

A

to_frame()

DFname.[‘columnname’].to_frame().head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

to refer to multiple columns in a DF, you pass it a ______, and the result is _______

A

list
a DataFrame

DFname[[‘col1’, ‘col2’, ‘col3’]].head()
when working with multiple columns you must use double brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An index is a built in column of ________. If you don’t designate a column as a specific index, the default is a ________.

A

row IDs

series of numbers starting at zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Indexes can be _______

A

Any type of data (strings, dates, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to assign an index to a DF?

A

DFname.set_index(‘columnname’)

This creates a copy of the DF with this index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What must be passed to set_index() to create the index on the original DF?

A

Inplace = True

DFname.set_index(‘columnname’, inplace=True)

Or overwrite DFname
DFname = DFname.set_index(‘columnname’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Most DataFrame methods return copies unless ________ is explicity included

A

Inplace = True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The opposite of set_index() is ______

A

Reset_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to sort a DF?

A

DFname.sort_value(‘columnname’, ascending = False, inplace = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to add a column from another DF?

A

NewDF [‘columnname’] = DFname[‘columnname’]

If the indexes are the same (pg 56 FantasyFootball), similar to cutting a excel column from one wookbook to another - the data has to match the row to which it belongs

22
Q

How to write to csv

A

to_csv()

DFname.to_csv(path.join(DATA_DIR, ‘filename.csv’), sep = ‘|’, index = false)

Sep = is separator, in this case a pipe |

Use index = True to include index in output

23
Q

Three primary column types

A

Number
String
Boolean

24
Q

Add a column with a value or math

A

DFname[‘newcolumn’] = 4
Or
DFname = DFname[‘column2’] * 8

DFname[[’newcolumn’,’column2’]]

25
Q

______ is a library for math

A

Numpy

Import numpy as np

26
Q

_____ method to return random rows

A

sample()
DFname.sample(5)

Returns a random 5 rows from DF

27
Q

How to concatenate strings?

A

+

28
Q

How to call string methods

A

.str

str. upper()
str. replace()

DFname[‘columnname’].str.replace(‘.’, ‘’).str.lower()

29
Q

How to negate (change True to False and vis versa)?

A

~

DFname[‘is_not’] = ~(DFname[‘column’] == ‘RB’)

30
Q

How to check multiple columns for True/False at once

A

(DF[[‘col1’,’col2’]] > 100)

Returns new columns for each column with True or False for each row

31
Q

When you think ‘flagging rows’ you should be thinking

A

To make a column of booleans (true/false)

32
Q

Which method takes a function and _____ it to every row in a column?

A

apply()
applies

Def is_skill(pos):
  Return pos in [‘rb’, ‘wr’]

DF[‘is_skill’] = DF[‘column’].apply(is_skill)

Apply defined function is_skill to every row in the column

Alternative:
DF[‘is_skill’] = DF[‘pos’].apply(
lambda x: x in [‘rb’,’wr’])

33
Q

Method to drop a column

A

DF.drop(‘col’, axis = 1, inplace = True)

Default is to drop rows, axis = 1 changes to drop column

34
Q

Method to rename a column

A

DF.rename(columns={‘col1’ : ‘newname’}, inplace= True)

35
Q

How does numpy represent a missing value?

A

nan

Not a Number

36
Q

Methods to detect nan or null value?

A

isnull()
notnull()

DF.[‘col’].isnull
Returns true/false in new column for each row

37
Q

Method to place custom value in place of nan?

A

fillna()

DF.[‘col’].fillna(-99)

38
Q

How to parse day, month, and year from string in non-pandas Python?

A

gameid = ‘2021090700’

year = gameid[0:4]
month = gameid[4:6]
day = gameid[6:8]
39
Q

How to parse day, month, and year in Pandas, including changing the data type?

A

gameid = ‘2021090700’

DF[‘month’] = DF[‘gameid’].astype(str).str[4:6]

40
Q

How to change a datatype?

A

astype(str)

astype(int)

41
Q

Which attribute shows data type of each column in DF?

A

dtype

42
Q

Pandas calls a string (str) a ______

A

Object

43
Q

Name six summary statistic functions.

A
Mean()
Std()
Count()
Sum()
Min()
Max()
Note that min and max also work with strings, and goes by alpha
44
Q

What is the axis default for summary statistic functions, and how can it be changed?

A

Default is columns (axis = 0), change to summarize by rows by axis = 1

45
Q

In Summary Stats, what values will be returned for True & False

A

True = 1; False = 0

46
Q

.any() evaluates what?

A

If any value in a column is True

47
Q

.all() evaluates what?

A

If all values in a column are True

48
Q

Code to determine how often a certain criteria is met?

A

(pg[[‘rush_yards, ‘rec_yards’]] > 100).any(axis =1).sum()

49
Q

What does value_counts() do?

A

Summarizes each element in a column:
DF[‘position’].valuecounts()

WR 10
RB 20
QB 10

To summarize by frequency:
DF[‘position’].valuecounts(normalize = True)

WR 25%
RB 50%
QB 10%

50
Q

What does crosstab do, and what is the syntax?

A

Similar to valuecounts() but returns for two columns

pd.crosstab(adp[‘team’], adp[‘position’])

Crosstab also takes a normalize argument

51
Q

How to see all Panda methods that operate on single columns? For DataFrames?

A

pd. series. and tab completing

pd. DataFrame. and tab completing