Pandas Flashcards

1
Q

How would you import pandas and see the version

A
import pandas as pd pd.\_\_version\_\_
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the primary structures in pandas

A

DataFrame, which is like and relational table with rows and columns, and Series, which is a single column. A DataFrame contains one or more Series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How could you manually construct a DataFrame with the Series ‘City name’ and ‘Population’?

A
city\_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento']) population = pd.Series([852469, 1015785, 485199]) pd.DataFrame({ 'City name': city\_names, 'Population': population })
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would you read a CSV file and display basic statistics or the first few rows?

A
mydf = pd.read\_csv('/file.csv', sep=",") df.describe() # To show interesting stats df.head() # To show the first few rows
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If you had a DataFrame “df” with a column “age”, how would you show a histogram of this column?

A
df.hist('age')
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you divide all entries in a series by 1000?

A

Just use the basic math operations, e.g.

new\_series = my\_series / 1000
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How could you perform more complex operations to map a series to a new series?

A

Use the Series.apply method with a lambda to convert the values, e.g.

is\_over1m\_series = pop\_series.apply(lambda val: val \> 1000000)

You can also use numpy on pandas series, e.g.

import numpy as np np.log(pop\_series)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you add a new series to a dataframe?

A

Just use regular Python dictionary operations, e.g.

cities['Area'] = pd.Series([46.87, 176.53, 97.92]) cities['Density'] = cities['Population'] / cities['Area']
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you combine boolean series?

A

Use the & bitwise operators, not the logical operators, e.g.

df['bool\_new'] = (df['bool\_1'] & df['bool\_2'].apply(lambda x: x \> 10))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an ‘index’? How might you use it to reorder or randomly shuffle data?

A

DataFrames and Series have an ‘index’ properties that is stable, and is by default assigned based on the ordering at creation. You can call ‘reindex’ with a desired index ordering to reorder data, e.g.

# Reorder the 3 elements as given cities.reindex([2, 0, 1]) # Or us numpy to randomly reorder cities.reindex(np.random.permutation(cities.index))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly