Python Transforming DataFrames Flashcards
Print the first four rows of the homelessness DataFrame.
print(homelessness.head())
Print data type info about the column types and missing values in homelessness.
print(homelessness.info())
Print the number of rows and columns in homelessness.
print(homelessness.shape)
Print some summary statistics that describe the homelessness DataFrame.
print(homelessness.describe())
import pandas
import pandas as pd
Print the column names of homelessness.
print(homelessness.columns)
Print the row index of homelessness
print(homelessness.index)
Pass the name of the ‘individuals’ column that you want to sort on into .sort_values() as homelessness_ind.
homelessness_ind = homelessness.sort_values(‘individuals’)
Sort homelessness by the number of homeless family_members in descending order, and save this as homelessness_fam.
homelessness_fam = homelessness.sort_values(‘family_members’, ascending=False)
Sort homelessness first by region (ascending), and then by number of family members (descending). Save this as homelessness_reg_fam.
homelessness_reg_fam = homelessness.sort_values([‘region’,’family_members’], ascending=[True, False])
Create a DataFrame called individuals that contains only the individuals column of homelessness.
individuals = homelessness[“individuals”]
Create a DataFrame called state_fam that contains only the state and family_members columns of homelessness, in that order.
state_fam = homelessness[[‘state’,’family_members’]]
Create a DataFrame called ind_state that contains the individuals and state columns of homelessness, in that order.
ind_state = homelessness[[‘individuals’,’state’]]
Filter homelessness for cases where the number of individuals is greater than ten thousand, assigning to ind_gt_10k
ind_gt_10k = homelessness[homelessness[‘individuals’]>10000]
Filter homelessness for cases where the USA Census region is “Mountain”, assigning to mountain_reg
mountain_reg = homelessness[homelessness[“region”]==
“Mountain”]