Pivot Table Flashcards
pivot table
The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data
titanic.pivot_table(‘survived’, index=’sex’, columns=’class’)
Equivalent to:
titanic.groupby([‘sex’, ‘class’])[‘survived’].aggregate(‘mean’).unstack()
age = pd.cut(titanic[‘age’], [0, 18, 80])
titanic.pivot_table(‘survived’, [‘sex’, age], ‘class’)
pivot tables can be specified with multiple levels.
looking at age as a third dimension.
fare = pd.qcut(titanic[‘fare’], 2)
titanic.pivot_table(‘survived’, [‘sex’, age], [fare, ‘class’])
same strategy when working with the columns.
add info on the fare paid using pd.qcut to automatically compute quantiles
DataFrame.pivot_table(data, values=None, index=None, columns=None,
aggfunc=’mean’, fill_value=None, margins=False,
dropna=True, margins_name=’All’)
The full call signature of the pivot_table method
The aggfunc keyword controls what type of aggregation is applied
As in the GroupBy, the aggregation specification can be a string representing one of several common choices (e.g., ‘sum’, ‘mean’, ‘count’, ‘min’, ‘max’, etc.) or a function that implements an aggregation (e.g., np.sum(), min(), sum(), etc.). Additionally, it can be specified as a dictionary mapping a column to any of the above desired options:
Pivot aggfunc ex:
titanic.pivot_table(index=’sex’, columns=’class’,
aggfunc={‘survived’:sum, ‘fare’:’mean’})
titanic.pivot_table(‘survived’, index=’sex’, columns=’class’, margins=True)
Marings: compute totals along each grouping
Total All Values
pd.cut(titanic[‘age’], [0, 18, 80])
Use `cut` when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, `cut` could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.
pd.qcut(titanic[‘fare’], 2)
Signature: pd.qcut(x, q, labels=None, retbins=False, precision=3, duplicates=’raise’)
Docstring:
Quantile-based discretization function. Discretize variable into
equal-sized buckets based on rank or based on sample quantiles. For example
1000 values for 10 quantiles would produce a Categorical object indicating
quantile membership for each data point.