Pandas_JAS Flashcards
How to refer to a single column in a DF
DFname[‘columnname’].head() ##head() by default gives first 5 rows
Technically, a single column is a ______ not a ______
Series; DataFrame
A Series is part of a DataFrame
After typing code into the ______, highlight the code of interest and hit ____ to sent it to the ________
Editor
F9
REPL (console)
Normally you to install third party libraries with a tool like _____, but if you’re using ________ it comes with Pandas installed
pip
Anaconda Python Bundle
When importing Pandas the convention is to name it ______
pd.
To load a csv file:
DFname = pd._______(_______,(DATA_DIR, ‘filename.csv’))
read_csv
path.join
DATA_DIR is a variable where you have given the path to your file, e.g., DATA_DIR = ‘/Users/UserName/PythonDirectory’
What result does this give:
type(DFname)
pandas.core.frame.DataFrame
what method is called to give (by default) the first 5 rows of a DataFrame?
head()
DFname.head()
head() is a method because you can pass it the number of rows to print, ________ are used without passing any data in parenthesis
attributes
the attribute _______ returns the names of each column in the DF
columns
the attribute ______ returns the number of rows and columns in the DF
shape
______ will turn any Series into a one-column DF
to_frame()
DFname.[‘columnname’].to_frame().head()
to refer to multiple columns in a DF, you pass it a ______, and the result is _______
list
a DataFrame
DFname[[‘col1’, ‘col2’, ‘col3’]].head()
when working with multiple columns you must use double brackets
An index is a built in column of ________. If you don’t designate a column as a specific index, the default is a ________.
row IDs
series of numbers starting at zero
Indexes can be _______
Any type of data (strings, dates, etc)
How to assign an index to a DF?
DFname.set_index(‘columnname’)
This creates a copy of the DF with this index.
What must be passed to set_index() to create the index on the original DF?
Inplace = True
DFname.set_index(‘columnname’, inplace=True)
Or overwrite DFname
DFname = DFname.set_index(‘columnname’)
Most DataFrame methods return copies unless ________ is explicity included
Inplace = True
The opposite of set_index() is ______
Reset_index()
How to sort a DF?
DFname.sort_value(‘columnname’, ascending = False, inplace = True)
How to add a column from another DF?
NewDF [‘columnname’] = DFname[‘columnname’]
If the indexes are the same (pg 56 FantasyFootball), similar to cutting a excel column from one wookbook to another - the data has to match the row to which it belongs
How to write to csv
to_csv()
DFname.to_csv(path.join(DATA_DIR, ‘filename.csv’), sep = ‘|’, index = false)
Sep = is separator, in this case a pipe |
Use index = True to include index in output
Three primary column types
Number
String
Boolean
Add a column with a value or math
DFname[‘newcolumn’] = 4
Or
DFname = DFname[‘column2’] * 8
DFname[[’newcolumn’,’column2’]]
______ is a library for math
Numpy
Import numpy as np
_____ method to return random rows
sample()
DFname.sample(5)
Returns a random 5 rows from DF
How to concatenate strings?
+
How to call string methods
.str
str. upper()
str. replace()
DFname[‘columnname’].str.replace(‘.’, ‘’).str.lower()
How to negate (change True to False and vis versa)?
~
DFname[‘is_not’] = ~(DFname[‘column’] == ‘RB’)
How to check multiple columns for True/False at once
(DF[[‘col1’,’col2’]] > 100)
Returns new columns for each column with True or False for each row
When you think ‘flagging rows’ you should be thinking
To make a column of booleans (true/false)
Which method takes a function and _____ it to every row in a column?
apply()
applies
Def is_skill(pos): Return pos in [‘rb’, ‘wr’]
DF[‘is_skill’] = DF[‘column’].apply(is_skill)
Apply defined function is_skill to every row in the column
Alternative:
DF[‘is_skill’] = DF[‘pos’].apply(
lambda x: x in [‘rb’,’wr’])
Method to drop a column
DF.drop(‘col’, axis = 1, inplace = True)
Default is to drop rows, axis = 1 changes to drop column
Method to rename a column
DF.rename(columns={‘col1’ : ‘newname’}, inplace= True)
How does numpy represent a missing value?
nan
Not a Number
Methods to detect nan or null value?
isnull()
notnull()
DF.[‘col’].isnull
Returns true/false in new column for each row
Method to place custom value in place of nan?
fillna()
DF.[‘col’].fillna(-99)
How to parse day, month, and year from string in non-pandas Python?
gameid = ‘2021090700’
year = gameid[0:4] month = gameid[4:6] day = gameid[6:8]
How to parse day, month, and year in Pandas, including changing the data type?
gameid = ‘2021090700’
DF[‘month’] = DF[‘gameid’].astype(str).str[4:6]
How to change a datatype?
astype(str)
astype(int)
Which attribute shows data type of each column in DF?
dtype
Pandas calls a string (str) a ______
Object
Name six summary statistic functions.
Mean() Std() Count() Sum() Min() Max() Note that min and max also work with strings, and goes by alpha
What is the axis default for summary statistic functions, and how can it be changed?
Default is columns (axis = 0), change to summarize by rows by axis = 1
In Summary Stats, what values will be returned for True & False
True = 1; False = 0
.any() evaluates what?
If any value in a column is True
.all() evaluates what?
If all values in a column are True
Code to determine how often a certain criteria is met?
(pg[[‘rush_yards, ‘rec_yards’]] > 100).any(axis =1).sum()
What does value_counts() do?
Summarizes each element in a column:
DF[‘position’].valuecounts()
WR 10
RB 20
QB 10
To summarize by frequency:
DF[‘position’].valuecounts(normalize = True)
WR 25%
RB 50%
QB 10%
What does crosstab do, and what is the syntax?
Similar to valuecounts() but returns for two columns
pd.crosstab(adp[‘team’], adp[‘position’])
Crosstab also takes a normalize argument
How to see all Panda methods that operate on single columns? For DataFrames?
pd. series. and tab completing
pd. DataFrame. and tab completing