Python - Pandas Library Flashcards

1
Q

Syntax
Creating DataFrames

Create a DataFrame using dict format, specifying values for columns

A

df = pd.DataFrame( { “a” : [4 ,5, 6],
“b” : [7, 8, 9],
“c” : [10, 11, 12] },
index = [1, 2, 3] )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Syntax

Creating DataFrames

Create a DataFrame using list format, specifying values for each row

A

df = pd.DataFrame( [[4, 7, 10], [5, 8, 11], [6, 9, 12] ],

index=[1, 2, 3],

columns=[‘a’, ‘b’, ‘c’] )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Method Chaining

Most Pandas methods returns a DF so another method can be applied to the result.

Melt a DF, rename the columns, and then query it.

A

df = ( pd.melt(df)

.rename( columns={ ‘variable’ : ‘var’, ‘value’ : ‘val’} )

.query(‘val >= 200’)

)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reshaping Data

Gather the columns into rows

A

pd.melt (df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reshaping Data

Spread rows into columns

A

df.pivot ( columns=’var’,

values=’val’

)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reshaping Data

Append rows of df2 to df1

A

pd.concat ( [df1, df2] )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reshaping Data

Append columns of df2 to df1

A

pd.concat ( [df1, df2],

axis=1

)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Structures

What’s a Series and how do you define it?

A

A Series is basically an indexed list.

s = pd.Series ( [3, -5, 7, 4],

index=[‘a’, ‘b’, ‘c’, ‘d’]

)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Structures

What’s a DataFrame and how do you define it?

A

A DataFrame is basically a bunch of Series (or columns) that are concatenated together and indexed.

data = { ‘Country’: [‘Belgium’, ‘India’, ‘Brazil’],

‘Capital’: [‘Brussels’, ‘New Delhi’, ‘Brasília’],

‘Population’: [11190846, 1303171035, 207847528]

}

df = pd.DataFrame ( data,

columns=[‘Country’, ‘Capital’, ‘Population’]

)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selection

How do you get one column of values?

A

You access it like a dictionary (by referencing the key or in this case the column name)

col_of_values = df [‘col_name’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Selection

How do you get a subset of the DataFrame, or get multiple columns?

A

Access it just like list indexing / slicing.

df [start_col# : stop_col#]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Selecting

Select value(s) in a row with a named index.

A

df.loc [name_of index or row]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Selecting

Select value(s) in a row with a named index with a certain condition, and only those specific columns.

A

df.loc [df [ ‘a’] > 10,

[‘a’, ‘c’] ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Selecting

Select value(s) in a column with a numbered / default index.

A

df.iloc [num_of_indexed_row]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Subset Observations / Rows

Extract rows that meet a logical condition

A

df [df.Length > 7]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Subset Observations / Rows

Remove duplicate rows

A

df.drop_duplicates()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Subset Observation / Rows

Select first n rows

A

df.head(n)

18
Q

Subset Observation / Rows​

Select last n rows

A

df.tail(n)

19
Q

Subset Observation / Rows​

Select rows by default indexed position

A

df.iloc [10:20]

20
Q

Subset Variables / Columns

Select multiple columns with specific names.

A

df [[ ‘width’, ‘length’, ‘species’] ]

*notice you have to input a list into the dataframe’s reference

21
Q

Subset Variables / Columns

Select single column with specific name.

A

df [‘width’]

or

df.width

22
Q

Parameter that can be inserted in most DataFrame methods to modify the original DataFrame in place.

A

inplace = ‘True’

23
Q

Sort

Order the rows by values of a column (low to high)

A

df.sort_values ( ‘mpg’ )

24
Q

Sort

Order rows by values of a column (from high to low)

A

df.sort_values ( ‘mpg’, ascending=False )

25
Q

Rename

Rename the columns of a DataFrame

A

df.rename( columns = { ‘y’:’year’ } )

26
Q

Drop

Drop certain columns from a DataFrame

A

df.drop ( columns=[‘Length’, ‘Height’] )

27
Q

Handling Missing Data

Drop rows with any column having NA/null data

A

df.dropna()

28
Q

Handling Missing Data

Replace all NA/null data with a value.

A

df.fillna( value )

29
Q

Handling Missing Data

Replace values with others

A

df.replace ( ‘a’, ‘f’ )

30
Q

Duplicate Data

Return unique values

A

df.unique()

31
Q

Duplicate Data

Drop duplicates

A

df.dropt_duplicates()

32
Q

Summarize Data

Count the number of rows with non-null values of each object

A

df [‘w’] .counts()

…or if you want to get rid of duplicates too…

df [‘w’] .value_counts()

33
Q

Summarize Data

Give # rows in a DataFrame

A

len(DF)

34
Q

Summarize Data

Give the # of rows and columns of a DataFrame

A

df.shape()

35
Q

Summarize Data

Describe the DataFrame columns

A

df.columns()

36
Q

Summarize Data

Describe the basic DataFrame information

A

df.info()

37
Q

Summarize Data

Describe the basic descriptive statistics for each column (or GroupBy)

A

df.describe()

38
Q

Summarize Data

Sum the values of a column

A

df.sum()

39
Q

Summarize Data

Give the median and mean of a column.

A

df. median()
df. mean()

40
Q

Summarize Data

Give the min, max, var, std of a column(s).

A

df. min()
df. max()
df. var()
df. std()

41
Q

Apply Function

Apply a function to each value (all rows) of a column

*Note this is vectorizes, so it’s efficient + fast

A

function = lambda x: x*2

df.apply( function )