Pandas 4 Granularity Flashcards
If data is a collection of structured information, _____________ is the level your collection is at.
Granularity
Aggregating data to be less granular is called __________
Grouping
It necessarily involves loss of detail
Stacking
Aka reshaping
Crams data that was formerly in unique rows into separate columns
What is the syntax for grouping in Pandas?
DF.groupby(‘column’).sum()
Sum() or whatever aggregating function is needed
The column grouped becomes the new index by default, or pass as_index=False to groupby
How to return groupby of just columns of interest?
sum_cols = [‘col1’, ‘col2’, ‘col3’]
DF.groupby(‘game_id’).sum()[sum_cols]
How to use the agg() function?
DF.groupby(‘game_id’).agg({yards_gained’: sum,
‘Play_id’ : ‘count’,
‘Intercep’ “ ‘sum’,
‘Touchdown’ : ‘sum})
Agg() takes a dictionary
The groupby columns will have the same name as the key columns
To rename the colums, use tuple pairs: DF.groupby(‘game_id’).agg( Yards = (yards_gained’: sum), Nplays = (‘Play_id’ : ‘count’), intercep = (‘Intercep’ “ ‘sum’), Touchdown = (‘Touchdown’ : ‘sum))
—this no longers passes a dictionary, instead agg() takes arguments, each in a
new_var = (‘old_var’, ‘function-as-a-string-name’)
format
Page 86
Stacking is similar to
Pviot Table
A join in Pandas is called
Merging or horizontal concatenation
A union in Pandas is called
Appending or vertical concatenation