Pandas Evaluation Flashcards

1
Q

import numexpr

mask_numexpr = numexpr.evaluate(‘(x > 0.5) & (y < 0.5)’)

A

The Numexpr library gives you the ability to compute this type of compound expression element by element, without the need to allocate full intermediate arrays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

np.allclose()

A

numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)[source]¶
Returns True if two arrays are element-wise equal within a tolerance.

The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b.

If either array contains one or more NaNs, False is returned. Infs are treated as equal if they are in the same place and of the same sign in both arrays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
result1 = (df1 < df2) &amp; (df2 <= df3) &amp; (df3 != df4)
result2 = pd.eval('df1 < df2 <= df3 != df4')
result1 = (df1 < 0.5) &amp; (df2 < 0.5) | (df3 < df4)
result2 = pd.eval('(df1 < 0.5) &amp; (df2 < 0.5) | (df3 < df4)')
A

pd. eval() supports all arithmetic operators. For example:

pd. eval() supports all comparison operators, including chained expressions:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

result3 = pd.eval(‘(df1 < 0.5) and (df2 < 0.5) or (df3 < df4)’)

A

In addition, it supports the use of the literal and and or in Boolean expressions:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
result1 = df2.T[0] + df3.iloc[1]
result2 = pd.eval('df2.T[0] + df3.iloc[1]')
A

pd.eval() supports access to object attributes via the obj.attr syntax, and indexes via the obj[index] syntax:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

result3 = df.eval(‘(A + B) / (C - 1)’)
vs
result2 = pd.eval(“(df.A + df.B) / (df.C - 1)”)
(using pandas)

A

Just as Pandas has a top-level pd.eval() function, DataFrames have an eval() method that works in similar ways. The benefit of the eval() method is that columns can be referred to by name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

df.eval(‘D = (A + B) / C’, inplace=True)

A

We can use df.eval() to create a new column ‘D’ and assign to it a value computed from the other columns:

(Or modify existing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
column_mean = df.mean(1)
result1 = df['A'] + column_mean
result2 = df.eval('A + @column_mean')
A

The DataFrame.eval() method supports an additional syntax that lets it work with local Python variables.

The @ character here marks a variable name rather than a column name, and lets you efficiently evaluate expressions involving the two “namespaces”: the namespace of columns, and the namespace of Python objects. Notice that this @ character is only supported by the DataFrame.eval() method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

result2 = df.query(‘A < 0.5 and B < 0.5’)

equivalent to
result1 = df[(df.A < 0.5) & (df.B < 0.5)]
result2 = pd.eval(‘df[(df.A < 0.5) & (df.B < 0.5)]’)

A

In addition to being a more efficient computation, compared to the masking expression this is much easier to read and understand. Note that the query() method also accepts the @ flag to mark local variables:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
Cmean = df['C'].mean()
result1 = df[(df.A < Cmean) &amp; (df.B < Cmean)]
result2 = df.query('A < @Cmean and B < @Cmean')
A

In addition to being a more efficient computation, compared to the masking expression this is much easier to read and understand. Note that the query() method also accepts the @ flag to mark local variables:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

x = df[(df.A < 0.5) & (df.B < 0.5)]

A
equivalent to:
tmp1 = df.A < 0.5
tmp2 = df.B < 0.5
tmp3 = tmp1 &amp; tmp2
x = df[tmp3]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly