groupby pandas Flashcards

1
Q

create a dataframe with column A, B, C and D. C and D have random numbers and A and B has words.

we then want to group by column A and B.

A

df = pd.DataFrame({‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,
…: ‘foo’, ‘bar’, ‘foo’, ‘foo’],
…: ‘B’ : [‘one’, ‘one’, ‘two’, ‘three’,
…: ‘two’, ‘two’, ‘one’, ‘three’],
…: ‘C’ : np.random.randn(8),
…: ‘D’ : np.random.randn(8)})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

df2 = pd.DataFrame({‘X’ : [‘B’, ‘B’, ‘A’, ‘A’], ‘Y’ : [1, 2, 3, 4]})

sum group X

take group X and form subset group A and then do the same with b.

A

In [13]: df2 = pd.DataFrame({‘X’ : [‘B’, ‘B’, ‘A’, ‘A’], ‘Y’ : [1, 2, 3, 4]})

In [14]: df2.groupby([‘X’]).sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

we have data below we want to group by months so that it would show all the time increments for each month.

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80
A
In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.TimeGrouper(“M”)) # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80
In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80
How well did you know this?
1
Not at all
2
3
4
5
Perfectly