pandas Flashcards

importing, filtering, slicing, querying

1
Q

Create a DataFrame df from this dictionary data which has the index labels.

A

df = pd.DataFrame(data, index=labels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Display a summary of the basic information about this DataFrame and its data.

A

df.info()

…or…

df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Return the first 3 rows of the DataFrame df.

A

df.iloc[:3]

or equivalently

df.head(3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Select just the ‘animal’ and ‘age’ columns from the DataFrame df.

A

df.loc[:, [‘animal’, ‘age’]]

or

df[[‘animal’, ‘age’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Select the data in rows [3, 4, 8] and in columns [‘animal’, ‘age’].

A

df.loc[df.index[[3, 4, 8]], [‘animal’, ‘age’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Select only the rows where the number of visits is greater than 3.

A

df[df[‘visits’] > 3]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Select the rows where the age is missing, i.e. is NaN.

A

df[df[‘age’].isnull()]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Select the rows where the animal is a cat and the age is less than 3.

A

df[(df[‘animal’] == ‘cat’) & (df[‘age’] < 3)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Select the rows the age is between 2 and 4 (inclusive).

A

df[df[‘age’].between(2, 4)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Change the age in row ‘f’ to 1.5.

A

df.loc[‘f’, ‘age’] = 1.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculate the sum of all visits (the total number of visits).

A

df[‘visits’].sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Calculate the mean age for each different animal in df.

A

df.groupby(‘animal’)[‘age’].mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Append a new row ‘k’ to df with your choice of values for each column. Then delete that row to return the original DataFrame.

A

df.loc[‘k’] = [5.5, ‘dog’, ‘no’, 2]

and then deleting the new row…

df = df.drop(‘k’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Count the number of each type of animal in df.

A

df[‘animal’].value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sort df first by the values in the ‘age’ in decending order, then by the value in the ‘visit’ column in ascending order.

A

df.sort_values(by=[‘age’, ‘visits’], ascending=[False, True])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The ‘priority’ column contains the values ‘yes’ and ‘no’. Replace this column with a column of boolean values: ‘yes’ should be True and ‘no’ should be False.

A

df[‘priority’] = df[‘priority’].map({‘yes’: True, ‘no’: False})

17
Q

In the ‘animal’ column, change the ‘snake’ entries to ‘python’.

A

df[‘animal’] = df[‘animal’].replace(‘snake’, ‘python’)

18
Q

For each animal type and each number of visits, find the mean age. In other words, each row is an animal, each column is a number of visits and the values are the mean ages (hint: use a pivot table).

A

df.pivot_table(index=’animal’, columns=’visits’, values=’age’, aggfunc=’mean’)