Pandas 4 Granularity Flashcards

Question 1

Q

If data is a collection of structured information, _____________ is the level your collection is at.

Answer

A

Granularity

Question 2

Q

Aggregating data to be less granular is called __________

Answer

A

Grouping

It necessarily involves loss of detail

Question 3

Q

Stacking

Answer

A

Aka reshaping

Crams data that was formerly in unique rows into separate columns

Question 4

Q

What is the syntax for grouping in Pandas?

Answer

A

DF.groupby(‘column’).sum()

Sum() or whatever aggregating function is needed

The column grouped becomes the new index by default, or pass as_index=False to groupby

Question 5

Q

How to return groupby of just columns of interest?

Answer

A

sum_cols = [‘col1’, ‘col2’, ‘col3’]

DF.groupby(‘game_id’).sum()[sum_cols]

Question 6

Q

How to use the agg() function?

Answer

A

DF.groupby(‘game_id’).agg({yards_gained’: sum,
‘Play_id’ : ‘count’,
‘Intercep’ “ ‘sum’,
‘Touchdown’ : ‘sum})

Agg() takes a dictionary

The groupby columns will have the same name as the key columns

To rename the colums, use tuple pairs:
DF.groupby(‘game_id’).agg(
Yards = (yards_gained’: sum),
Nplays = (‘Play_id’ : ‘count’),
intercep = (‘Intercep’ “ ‘sum’),
Touchdown = (‘Touchdown’ : ‘sum))

—this no longers passes a dictionary, instead agg() takes arguments, each in a
new_var = (‘old_var’, ‘function-as-a-string-name’)
format

Page 86

Question 7

Q

Stacking is similar to

Answer

A

Pviot Table

Question 8

Q

A join in Pandas is called

Answer

A

Merging or horizontal concatenation

Question 9

Q

A union in Pandas is called

Answer

A

Appending or vertical concatenation

Pandas 4 Granularity Flashcards

(9 cards)