Numpy and Pandas Fundamentals Flashcards
Fundamentals of Numpy and Pandas from the Pandas book
1
Q
pd.Series
A
- Index (and name/label?) must be any hashable type
- missing values are automatically excluded
- Referencing values by name, dictionary notation and numpy masking work
- Passing a dictionary results pd.Series(dict.values, index = dict.keys())
- automatically aligns with differently indexed Series
- Index can be altered in place by assignment
2
Q
pd.DataFrame
A
- Retrieve column by dict-like or attribute syntax
- Get rows by frame.ix[‘row_index’]
- accepts rows of dicts, dicts of dicts, and dicts of format ‘key’:list() where lists are of equal length (or np.arrays)
- del dataframe[‘colname’] works as expected
- columns and row indices can have names just like Series
- frame.values returns a 2d array (rows, columns)
- Accepts Numpy masked arrays. ‘masked’ elements are NA/missing in the result
- list of lists or list of tuples defaults to passing in row-wise
3
Q
index objects
A
methods
- append
- diff
- intersection
- union
- isin
- delete
- drop
- insert
- is_montonic
- is_unique
- unique
4
Q
reindex
A
- creates a new object that conforms to the new index
- interpolation methods like ffill or bfill
- with dataframe, rows is the default, but columns can be specified by keyword. new columns will be NaN, columns left out will be omitted.
- limit option sets max of ffill or bfill
- fill_value can fill elements that don’t exist
5
Q
index selection and filtering
A
Series and DataFrames
- ‘slicing’ is different than normal python - the endpoint is inclusive
- setting works as expected obj[‘b’:’c’] = 5 puts 5 in the b and c slots
- obj[[1,3]] pulls first and third index from series (not dataframe)
- obj[[‘a’,’d’,’b’]] pulls elements out in that order by that index
DataFrames
- df[:2] returns rows 0 and 1 (for some reason, not inclusive as mentioned above)
- df[df[‘col’] > 5] returns df where it’s true
- df[df < 0] = 0 makes all elements less than 0 equal to 0
- df.ix[row_criteria, col_criteria]
6
Q
indexing options with DataFrame
A
- obj[val] select single col or sequence of columns, boolean, slice, boolean dataframe
- obj.ix[val] select single row or subset of rows
- obj.ix[: , val] select single column of subset of columns
- obj.is[val1, val2] select both rows and columns
- reindex - conform one or more axes to new index
- xs method = select single row or column as a series by label
- icol, irow methods : select column or row, respectively, as series by integer location
- get_value, set_value: select single value by row and column lable
7
Q
Arithmetic methods on DataFrames
A
- add
- sub
- mul
- div
each is a method on DataFrame with an optional fill_value argument for elements that do not have a match
So adding elements that don’t have a match will produce NaN, but fill_value=0 will produce identity.
8
Q
Arithmatic between Series and DataFrames
A
- df - series: will broadcast the series row down all the rows of the df
- if an index wasn’t found in the series, that column will be added as NaN to the df
- DataFrame arithmetic methods are used for column-wise math
ex. dframe.sub(series, axis = 0)
9
Q
Function application and mapping
A
- numpy ufuncs (element-wise array methods) work with dataframes and series
- dataframe has an apply method, like R’s apply. however, axis = 0 applies ACROSS rows (colsum in R), axis = 1 applies ACROSS columns (rowsum in R)
- Function to apply need not return a scalar - can also be a Series object.
-
applymap performs element wise operations ex. format = lambda x: “%.2f’ % x
5.