groupby pandas Flashcards
create a dataframe with column A, B, C and D. C and D have random numbers and A and B has words.
we then want to group by column A and B.
df = pd.DataFrame({‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,
…: ‘foo’, ‘bar’, ‘foo’, ‘foo’],
…: ‘B’ : [‘one’, ‘one’, ‘two’, ‘three’,
…: ‘two’, ‘two’, ‘one’, ‘three’],
…: ‘C’ : np.random.randn(8),
…: ‘D’ : np.random.randn(8)})
df2 = pd.DataFrame({‘X’ : [‘B’, ‘B’, ‘A’, ‘A’], ‘Y’ : [1, 2, 3, 4]})
sum group X
take group X and form subset group A and then do the same with b.
In [13]: df2 = pd.DataFrame({‘X’ : [‘B’, ‘B’, ‘A’, ‘A’], ‘Y’ : [1, 2, 3, 4]})
In [14]: df2.groupby([‘X’]).sum()
we have data below we want to group by months so that it would show all the time increments for each month.
Date abc xyz 01-Jun-13 100 200 03-Jun-13 -20 50 15-Aug-13 40 -5 20-Jan-14 25 15 21-Feb-14 60 80
In [11]: df1 Out[11]: abc xyz Date 2013-06-01 100 200 2013-06-03 -20 50 2013-08-15 40 -5 2014-01-20 25 15 2014-02-21 60 80
In [12]: g = df1.groupby(pd.TimeGrouper(“M”)) # DataFrameGroupBy (grouped by Month)
In [13]: g.sum() Out[13]: abc xyz Date 2013-06-30 80 250 2013-07-31 NaN NaN 2013-08-31 40 -5 2013-09-30 NaN NaN 2013-10-31 NaN NaN 2013-11-30 NaN NaN 2013-12-31 NaN NaN 2014-01-31 25 15 2014-02-28 60 80
In [14]: df1.resample("M", how='sum') # the same Out[14]: abc xyz Date 2013-06-30 40 125 2013-07-31 NaN NaN 2013-08-31 40 -5 2013-09-30 NaN NaN 2013-10-31 NaN NaN 2013-11-30 NaN NaN 2013-12-31 NaN NaN 2014-01-31 25 15 2014-02-28 60 80