Pandas Flashcards
If you don’t pass an item to the index parameter and a dictionary is given to the data parameter,
then Pandas will use the dictionary keys as index labels
How to import pandas and check the version?
print(pd.__version__)
print(pd.show_versions(as_json=True))
Create a pandas series from each of the items below: a list, numpy and a dictionary
pd.Series(item)
what does zip(*iterables) do ?
The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable.
Convert the series ser into a dataframe with its index as another column on the dataframe
df = ser.to_frame().reset_index()
Combine ser1 and ser2 to form a dataframe.
df = pd.concat([ser1, ser2], axis=1) df = pd.DataFrame({'col1': ser1, 'col2': ser2})
Give a name to the series ser calling it ‘alphabets’.
ser.name = ‘alphabets’
From ser1 remove items present in ser2.
ser1[~ser1.isin(ser2)]
Get all items of ser1 and ser2 not common to both.
ser_u = pd.Series(np.union1d(ser1, ser2)) # union
ser_i = pd.Series(np.intersect1d(ser1, ser2)) # intersect
ser_u[~ser_u.isin(ser_i)
Compute the minimum, 25th percentile, median, 75th, and maximum of ser.
ser = pd.Series(np.random.normal(10, 5, 25))
np.percentile(ser, q=[0, 25, 50, 75, 100])
Calculte the frequency counts of each unique value ser.
ser = pd.Series(np.take(list(‘abcdefgh’), np.random.randint(8, size=30)))
ser.value_counts()
From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.
ser[~ser.isin(ser.value_counts().index[:2])] = ‘Other’
Bin the series ser into 10 equal deciles and replace the values with the bin name.
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1],
labels=[‘1st’, ‘2nd’, ‘3rd’, ‘4th’, ‘5th’, ‘6th’, ‘7th’, ‘8th’, ‘9th’, ‘10th’])