Pandas Evaluation Flashcards
import numexpr
mask_numexpr = numexpr.evaluate(‘(x > 0.5) & (y < 0.5)’)
The Numexpr library gives you the ability to compute this type of compound expression element by element, without the need to allocate full intermediate arrays.
np.allclose()
numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)[source]¶
Returns True if two arrays are element-wise equal within a tolerance.
The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b.
If either array contains one or more NaNs, False is returned. Infs are treated as equal if they are in the same place and of the same sign in both arrays.
result1 = (df1 < df2) & (df2 <= df3) & (df3 != df4) result2 = pd.eval('df1 < df2 <= df3 != df4')
result1 = (df1 < 0.5) & (df2 < 0.5) | (df3 < df4) result2 = pd.eval('(df1 < 0.5) & (df2 < 0.5) | (df3 < df4)')
pd. eval() supports all arithmetic operators. For example:
pd. eval() supports all comparison operators, including chained expressions:
result3 = pd.eval(‘(df1 < 0.5) and (df2 < 0.5) or (df3 < df4)’)
In addition, it supports the use of the literal and and or in Boolean expressions:
result1 = df2.T[0] + df3.iloc[1] result2 = pd.eval('df2.T[0] + df3.iloc[1]')
pd.eval() supports access to object attributes via the obj.attr syntax, and indexes via the obj[index] syntax:
result3 = df.eval(‘(A + B) / (C - 1)’)
vs
result2 = pd.eval(“(df.A + df.B) / (df.C - 1)”)
(using pandas)
Just as Pandas has a top-level pd.eval() function, DataFrames have an eval() method that works in similar ways. The benefit of the eval() method is that columns can be referred to by name
df.eval(‘D = (A + B) / C’, inplace=True)
We can use df.eval() to create a new column ‘D’ and assign to it a value computed from the other columns:
(Or modify existing)
column_mean = df.mean(1) result1 = df['A'] + column_mean result2 = df.eval('A + @column_mean')
The DataFrame.eval() method supports an additional syntax that lets it work with local Python variables.
The @ character here marks a variable name rather than a column name, and lets you efficiently evaluate expressions involving the two “namespaces”: the namespace of columns, and the namespace of Python objects. Notice that this @ character is only supported by the DataFrame.eval() method
result2 = df.query(‘A < 0.5 and B < 0.5’)
equivalent to
result1 = df[(df.A < 0.5) & (df.B < 0.5)]
result2 = pd.eval(‘df[(df.A < 0.5) & (df.B < 0.5)]’)
In addition to being a more efficient computation, compared to the masking expression this is much easier to read and understand. Note that the query() method also accepts the @ flag to mark local variables:
Cmean = df['C'].mean() result1 = df[(df.A < Cmean) & (df.B < Cmean)] result2 = df.query('A < @Cmean and B < @Cmean')
In addition to being a more efficient computation, compared to the masking expression this is much easier to read and understand. Note that the query() method also accepts the @ flag to mark local variables:
x = df[(df.A < 0.5) & (df.B < 0.5)]
equivalent to: tmp1 = df.A < 0.5 tmp2 = df.B < 0.5 tmp3 = tmp1 & tmp2 x = df[tmp3]