More Pandas Flashcards

1
Q

Rename categories

A

s = pd.Series([“a”, “b”, “c”, “a”], dtype=”category”)

s
Out[68]:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [‘a’, ‘b’, ‘c’]

new_categories = [“Group %s” % g for g in s.cat.categories]

s = s.cat.rename_categories(new_categories)

0 Group a
1 Group b
2 Group c
3 Group a
dtype: category
Categories (3, object): [‘Group a’, ‘Group b’, ‘Group c’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Add categories

A

s = s.cat.add_categories([4])

s.cat.categories
Out[77]: Index([‘Group a’, ‘Gr…..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Remove categories, or unused ones

A

In [79]: s = s.cat.remove_categories([4])

In [80]: s
Out[80]:
0 Group a
1 Group b
2 Group c
3 Group a
dtype: category
Categories (3, object): [‘Group a’, ‘Group b’, ‘Group c’]

In [81]: s = pd.Series(pd.Categorical([“a”, “b”, “a”], categories=[“a”, “b”, “c”, “d”]))

In [82]: s
Out[82]:
0 a
1 b
2 a
dtype: category
Categories (4, object): [‘a’, ‘b’, ‘c’, ‘d’]

In [83]: s.cat.remove_unused_categories()
Out[83]:
0 a
1 b
2 a
dtype: category
Categories (2, object): [‘a’, ‘b’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Set categories

A

In [84]: s = pd.Series([“one”, “two”, “four”, “-“], dtype=”category”)

In [85]: s
Out[85]:
0 one
1 two
2 four
3 -
dtype: category
Categories (4, object): [’-‘, ‘four’, ‘one’, ‘two’]

In [86]: s = s.cat.set_categories([“one”, “two”, “three”, “four”])

In [87]: s
Out[87]:
0 one
1 two
2 four
3 NaN
dtype: category
Categories (4, object): [‘one’, ‘two’, ‘three’, ‘four’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

value_counts and categoricals

A

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data:

s = pd.Series(pd.Categorical([“a”, “b”, “c”, “c”], categories=[“c”, “a”, “b”, “d”]))

s.value_counts()
Out[131]:
c 2
a 1
b 1
d 0
Name: count, dtype: int64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

groupby, mean and unobserved categories

A

In [135]: cats = pd.Categorical(
…..: [“a”, “b”, “b”, “b”, “c”, “c”, “c”], categories=[“a”, “b”, “c”, “d”]
…..: )
…..:

In [136]: df = pd.DataFrame({“cats”: cats, “values”: [1, 2, 2, 2, 3, 4, 5]})

In [137]: df.groupby(“cats”, observed=False).mean()
Out[137]:
values
cats
a 1.0
b 2.0
c 4.0
d NaN

In [138]: cats2 = pd.Categorical([“a”, “a”, “b”, “b”], categories=[“a”, “b”, “c”])

In [139]: df2 = pd.DataFrame(
…..: {
…..: “cats”: cats2,
…..: “B”: [“c”, “d”, “c”, “d”],
…..: “values”: [1, 2, 3, 4],
…..: }
…..: )
…..:

In [140]: df2.groupby([“cats”, “B”], observed=False).mean()
Out[140]:
values
cats B
a c 1.0
d 2.0
b c 3.0
d 4.0
c c NaN
d NaN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly