Python - Pandas Library Flashcards

Question 1

Q

Syntax
Creating DataFrames

Create a DataFrame using dict format, specifying values for columns

Answer

A

df = pd.DataFrame( { “a” : [4 ,5, 6],
“b” : [7, 8, 9],
“c” : [10, 11, 12] },
index = [1, 2, 3] )

Question 2

Q

Syntax

Creating DataFrames

Create a DataFrame using list format, specifying values for each row

Answer

A

df = pd.DataFrame( [[4, 7, 10], [5, 8, 11], [6, 9, 12] ],

index=[1, 2, 3],

columns=[‘a’, ‘b’, ‘c’] )

Question 3

Q

Method Chaining

Most Pandas methods returns a DF so another method can be applied to the result.

Melt a DF, rename the columns, and then query it.

Answer

A

df = ( pd.melt(df)

.rename( columns={ ‘variable’ : ‘var’, ‘value’ : ‘val’} )

.query(‘val >= 200’)

)

Question 4

Q

Reshaping Data

Gather the columns into rows

Answer

A

pd.melt (df)

Question 5

Q

Reshaping Data

Spread rows into columns

Answer

A

df.pivot ( columns=’var’,

values=’val’

)

Question 6

Q

Reshaping Data

Append rows of df2 to df1

Answer

A

pd.concat ( [df1, df2] )

Question 7

Q

Reshaping Data

Append columns of df2 to df1

Answer

A

pd.concat ( [df1, df2],

axis=1

)

Question 8

Q

Data Structures

What’s a Series and how do you define it?

Answer

A

A Series is basically an indexed list.

s = pd.Series ( [3, -5, 7, 4],

index=[‘a’, ‘b’, ‘c’, ‘d’]

)

Question 9

Q

Data Structures

What’s a DataFrame and how do you define it?

Answer

A

A DataFrame is basically a bunch of Series (or columns) that are concatenated together and indexed.

data = { ‘Country’: [‘Belgium’, ‘India’, ‘Brazil’],

‘Capital’: [‘Brussels’, ‘New Delhi’, ‘Brasília’],

‘Population’: [11190846, 1303171035, 207847528]

}

df = pd.DataFrame ( data,

columns=[‘Country’, ‘Capital’, ‘Population’]

)

Question 10

Q

Selection

How do you get one column of values?

Answer

A

You access it like a dictionary (by referencing the key or in this case the column name)

col_of_values = df [‘col_name’]

Question 11

Q

Selection

How do you get a subset of the DataFrame, or get multiple columns?

Answer

A

Access it just like list indexing / slicing.

df [start_col# : stop_col#]

Question 12

Q

Selecting

Select value(s) in a row with a named index.

Answer

A

df.loc [name_of index or row]

Question 13

Q

Selecting

Select value(s) in a row with a named index with a certain condition, and only those specific columns.

Answer

A

df.loc [df [ ‘a’] > 10,

[‘a’, ‘c’] ]

Question 14

Q

Selecting

Select value(s) in a column with a numbered / default index.

Answer

A

df.iloc [num_of_indexed_row]

Question 15

Q

Subset Observations / Rows

Extract rows that meet a logical condition

Answer

A

df [df.Length > 7]

Question 16

Q

Subset Observations / Rows

Remove duplicate rows

Answer

A

df.drop_duplicates()

Question 17

Q

Subset Observation / Rows

Select first n rows

Answer

A

df.head(n)

Question 18

Q

Subset Observation / Rows

Select last n rows

Answer

A

df.tail(n)

Question 19

Q

Subset Observation / Rows

Select rows by default indexed position

Answer

A

df.iloc [10:20]

Question 20

Q

Subset Variables / Columns

Select multiple columns with specific names.

Answer

A

df [[ ‘width’, ‘length’, ‘species’] ]

*notice you have to input a list into the dataframe’s reference

Question 21

Q

Subset Variables / Columns

Select single column with specific name.

Answer

A

df [‘width’]

or

df.width

Question 22

Q

Parameter that can be inserted in most DataFrame methods to modify the original DataFrame in place.

Answer

A

inplace = ‘True’

Question 23

Q

Sort

Order the rows by values of a column (low to high)

Answer

A

df.sort_values ( ‘mpg’ )

Question 24

Q

Sort

Order rows by values of a column (from high to low)

Answer

A

df.sort_values ( ‘mpg’, ascending=False )

Question 25

Q

Rename

Rename the columns of a DataFrame

Answer

A

df.rename( columns = { ‘y’:’year’ } )

Question 26

Q

Drop

Drop certain columns from a DataFrame

Answer

A

df.drop ( columns=[‘Length’, ‘Height’] )

Question 27

Q

Handling Missing Data

Drop rows with any column having NA/null data

Answer

A

df.dropna()

Question 28

Q

Handling Missing Data

Replace all NA/null data with a value.

Answer

A

df.fillna( value )

Question 29

Q

Handling Missing Data

Replace values with others

Answer

A

df.replace ( ‘a’, ‘f’ )

Question 30

Q

Duplicate Data

Return unique values

Answer

A

df.unique()

Question 31

Q

Duplicate Data

Drop duplicates

Answer

A

df.dropt_duplicates()

Question 32

Q

Summarize Data

Count the number of rows with non-null values of each object

Answer

A

df [‘w’] .counts()

…or if you want to get rid of duplicates too…

df [‘w’] .value_counts()

Question 33

Q

Summarize Data

Give # rows in a DataFrame

Question 34

Q

Summarize Data

Give the # of rows and columns of a DataFrame

Answer

A

df.shape()

Question 35

Q

Summarize Data

Describe the DataFrame columns

Answer

A

df.columns()

Question 36

Q

Summarize Data

Describe the basic DataFrame information

Answer

A

df.info()

Question 37

Q

Summarize Data

Describe the basic descriptive statistics for each column (or GroupBy)

Answer

A

df.describe()

Question 38

Q

Summarize Data

Sum the values of a column

Question 39

Q

Summarize Data

Give the median and mean of a column.

Answer

A

df. median()
df. mean()

Question 40

Q

Summarize Data

Give the min, max, var, std of a column(s).

Answer

A

df. min()
df. max()
df. var()
df. std()

Question 41

Q

Apply Function

Apply a function to each value (all rows) of a column

*Note this is vectorizes, so it’s efficient + fast

Answer

A

function = lambda x: x*2

df.apply( function )