Pandas Flashcards
How would you import pandas and see the version
import pandas as pd pd.\_\_version\_\_
What are the primary structures in pandas
DataFrame, which is like and relational table with rows and columns, and Series, which is a single column. A DataFrame contains one or more Series.
How could you manually construct a DataFrame with the Series ‘City name’ and ‘Population’?
city\_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento']) population = pd.Series([852469, 1015785, 485199]) pd.DataFrame({ 'City name': city\_names, 'Population': population })
How would you read a CSV file and display basic statistics or the first few rows?
mydf = pd.read\_csv('/file.csv', sep=",") df.describe() # To show interesting stats df.head() # To show the first few rows
If you had a DataFrame “df” with a column “age”, how would you show a histogram of this column?
df.hist('age')
How would you divide all entries in a series by 1000?
Just use the basic math operations, e.g.
new\_series = my\_series / 1000
How could you perform more complex operations to map a series to a new series?
Use the Series.apply method with a lambda to convert the values, e.g.
is\_over1m\_series = pop\_series.apply(lambda val: val \> 1000000)
You can also use numpy on pandas series, e.g.
import numpy as np np.log(pop\_series)
How can you add a new series to a dataframe?
Just use regular Python dictionary operations, e.g.
cities['Area'] = pd.Series([46.87, 176.53, 97.92]) cities['Density'] = cities['Population'] / cities['Area']
How do you combine boolean series?
Use the & bitwise operators, not the logical operators, e.g.
df['bool\_new'] = (df['bool\_1'] & df['bool\_2'].apply(lambda x: x \> 10))
What is an ‘index’? How might you use it to reorder or randomly shuffle data?
DataFrames and Series have an ‘index’ properties that is stable, and is by default assigned based on the ordering at creation. You can call ‘reindex’ with a desired index ordering to reorder data, e.g.
# Reorder the 3 elements as given cities.reindex([2, 0, 1]) # Or us numpy to randomly reorder cities.reindex(np.random.permutation(cities.index))