Pandas Flashcards
dimension of a Series
1
dimension of a DataFrame
2
create a Series object from a list, index it using another list
ser=pd.Series(data=list1, index=list2)
creat a Series using a numpy array
import numpy as np
arr=np.array([1,2,3,4])
ser=pd.Series(arr)
create a Series using a dictionary with keys as the index
ser=pd.Series(dict)
from the Series ser access the element with index ‘k’
ser['k'] #just like a dict
Describe pd.DataFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
What arguments does pd.DataFrame take ?
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
from DataFrame df grab the columns ‘name’ and ‘age’
mind the two brackets
df[[‘name, ‘age’]]
from DataFrame df grab the row with index ‘B’ as a Series
df.loc[‘B’]
from DataFrame df grab columns ‘one’, ‘three’ intersections with rows ‘B’, ‘D’
df.loc[[‘B’, ‘D’], [‘one’, ‘three’]]
from DataFrame df, grab row with location 3,2
df.iloc[3,2]
create a new column Total which shows the sum of the columns ‘C’, ‘D’, and ‘E’
df[‘Total’]=df[‘C’] + df[‘D’] + df[‘E’]
in df, delete the row with index ‘F’
df.drop(‘F’, axis=0, inplace=True)
in df, delete the column with index ‘Total’
df.drop(‘Total’, axis=1, inplace=True)
in df, create a new column named ‘Sex’ and assign it as the index
df[‘Sex’]=[‘Men’, ‘Women’]
df.set_index(‘Sex’, inplace=True)
Where does pandas beat numpy ?
NumPy’s ndarray data structure provides essential features for the type of
clean, well-organized data typically seen in numerical computing tasks. While it
serves this purpose very well, its limitations become clear when we need more flexibility
(attaching labels to data, working with missing data, etc.) and when attempting
operations that do not map well to element-wise broadcasting (groupings, pivots,
etc.), each of which is an important piece of analyzing the less structured data available
in many forms in the world around us. Pandas, and in particular its Series and
DataFrame objects, builds on the NumPy array structure and provides efficient access
to these sorts of “data munging” tasks that occupy much of a data scientist’s time.
What does this dataframe call return : df[‘x’]
The column Series, and not the ROW, with index ‘x’