Python Transforming DataFrames Flashcards
Print the first four rows of the homelessness DataFrame.
print(homelessness.head())
Print data type info about the column types and missing values in homelessness.
print(homelessness.info())
Print the number of rows and columns in homelessness.
print(homelessness.shape)
Print some summary statistics that describe the homelessness DataFrame.
print(homelessness.describe())
import pandas
import pandas as pd
Print the column names of homelessness.
print(homelessness.columns)
Print the row index of homelessness
print(homelessness.index)
Pass the name of the ‘individuals’ column that you want to sort on into .sort_values() as homelessness_ind.
homelessness_ind = homelessness.sort_values(‘individuals’)
Sort homelessness by the number of homeless family_members in descending order, and save this as homelessness_fam.
homelessness_fam = homelessness.sort_values(‘family_members’, ascending=False)
Sort homelessness first by region (ascending), and then by number of family members (descending). Save this as homelessness_reg_fam.
homelessness_reg_fam = homelessness.sort_values([‘region’,’family_members’], ascending=[True, False])
Create a DataFrame called individuals that contains only the individuals column of homelessness.
individuals = homelessness[“individuals”]
Create a DataFrame called state_fam that contains only the state and family_members columns of homelessness, in that order.
state_fam = homelessness[[‘state’,’family_members’]]
Create a DataFrame called ind_state that contains the individuals and state columns of homelessness, in that order.
ind_state = homelessness[[‘individuals’,’state’]]
Filter homelessness for cases where the number of individuals is greater than ten thousand, assigning to ind_gt_10k
ind_gt_10k = homelessness[homelessness[‘individuals’]>10000]
Filter homelessness for cases where the USA Census region is “Mountain”, assigning to mountain_reg
mountain_reg = homelessness[homelessness[“region”]==
“Mountain”]
Filter homelessness for cases where the number of family_members is less than one thousand and the region is “Pacific”, assigning to fam_lt_1k_pac
fam_lt_1k_pac = homelessness[(homelessness [‘family_members’]<1000) & (homelessness[‘region’]==’Pacific’)]
Filter homelessness for cases where the USA census region is “South Atlantic” or it is “Mid-Atlantic”, assigning to south_mid_atlantic
south_mid_atlantic = homelessness [(homelessness[‘region’]==’South Atlantic’) | (homelessness[‘region’]==’Mid-Atlantic’)]
Filter homelessness for cases where the USA census state is in the list of Mojave states, canu, assigning to mojave_homelessness. Given that
canu = [“California”, “Arizona”, “Nevada”, “Utah”]
mojave_homelessness = homelessness
[homelessness[‘state’].isin(canu)]
Add a column to homelessness, indiv_per_10k, containing the number of homeless individuals per ten thousand people in each state.
homelessness[“indiv_per_10k”] = 10000 * homelessness[‘individuals’] / homelessness[‘state_pop’]
Subset rows where indiv_per_10k is higher than 20, assigning to high_homelessness.
high_homelessness = homelessness[homelessness[“indiv_per_10k”]>20]
Sort high_homelessness by descending indiv_per_10k, assigning to high_homelessness_srt.
high_homelessness_srt = high_homelessness.sort_values(‘indiv_per_10k’, ascending=False)
Select only the state and indiv_per_10k columns of high_homelessness_srt and save as result. Look at the result.
result = high_homelessness_srt
[[‘state’,’indiv_per_10k’]]
print(result)