Pandas Concat and Append Flashcards

1
Q

x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]
np.concatenate([x, y, z])

A

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

x = [[1, 2],
[3, 4]]
np.concatenate([x, x], axis=1)

A

The first argument is a list or tuple of arrays to concatenate. Additionally, it takes an axis keyword that allows you to specify the axis along which the result will be concatenated:

array([[1, 2, 1, 2],
[3, 4, 3, 4]])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

pd.concat(objs, axis=0, join=’outer’, join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)

A

Pandas has a function, pd.concat(), which has a similar syntax to np.concatenate but contains a number of options that we’ll discuss momentarily:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

df1 = make_df(‘AB’, [1, 2])
df2 = make_df(‘AB’, [3, 4])
display(‘df1’, ‘df2’, ‘pd.concat([df1, df2])’)

A

df1

A B
1 A1 B1
2 A2 B2
df2

A B
3 A3 B3
4 A4 B4

pd.concat([df1, df2])
A	B
1	A1	B1
2	A2	B2
3	A3	B3
4	A4	B4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

pd.concat([ser1, ser2])

A

Simple concat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

pd.concat([df3, df4], axis=1)

A

By default, the concatenation takes place row-wise within the DataFrame (i.e., axis=0). Like np.concatenate, pd.concat allows specification of an axis along which concatenation will take place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

x = make_df(‘AB’, [0, 1])
y = make_df(‘AB’, [2, 3])
y.index = x.index # make duplicate indices!
display(‘x’, ‘y’, ‘pd.concat([x, y])’)

A

One important difference between np.concatenate and pd.concat is that Pandas concatenation preserves indices, even if the result will have duplicate indices! Consider this simple example:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

try:
pd.concat([x, y], verify_integrity=True)
except ValueError as e:
print(“ValueError:”, e)

A

If you’d like to simply verify that the indices in the result of pd.concat() do not overlap, you can specify the verify_integrity flag. With this set to True, the concatenation will raise an exception if there are duplicate indices. Here is an example, where for clarity we’ll catch and print the error message:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

display(‘x’, ‘y’, ‘pd.concat([x, y], ignore_index=True)’)

A

Sometimes the index itself does not matter, and you would prefer it to simply be ignored. This option can be specified using the ignore_index flag. With this set to true, the concatenation will create a new integer index for the resulting Series:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

pd.concat([x, y], keys=[‘x’, ‘y’])

A

Another option is to use the keys option to specify a label for the data sources; the result will be a hierarchically indexed series containing the data:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

pd.concat([df5, df6], join=’inner’)

A

By default, the entries for which no data is available are filled with NA values. To change this, we can specify one of several options for the join and join_axes parameters of the concatenate function. By default, the join is a union of the input columns (join=’outer’), but we can change this to an intersection of the columns using join=’inner’:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

pd.concat([df5, df6], join_axes=[df5.columns])

A

Another option is to directly specify the index of the remaininig colums using the join_axes argument, which takes a list of index objects. Here we’ll specify that the returned columns should be the same as those of the first input:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

df1.append(df2)

A

Because direct array concatenation is so common, Series and DataFrame objects have an append method that can accomplish the same thing in fewer keystrokes. For example, rather than calling pd.concat([df1, df2]), you can simply call df1.append(df2):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly