Python Transforming DataFrames Flashcards

1
Q

Print the first four rows of the homelessness DataFrame.

A

print(homelessness.head())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Print data type info about the column types and missing values in homelessness.

A

print(homelessness.info())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Print the number of rows and columns in homelessness.

A

print(homelessness.shape)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Print some summary statistics that describe the homelessness DataFrame.

A

print(homelessness.describe())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

import pandas

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Print the column names of homelessness.

A

print(homelessness.columns)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Print the row index of homelessness

A

print(homelessness.index)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pass the name of the ‘individuals’ column that you want to sort on into .sort_values() as homelessness_ind.

A

homelessness_ind = homelessness.sort_values(‘individuals’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sort homelessness by the number of homeless family_members in descending order, and save this as homelessness_fam.

A

homelessness_fam = homelessness.sort_values(‘family_members’, ascending=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sort homelessness first by region (ascending), and then by number of family members (descending). Save this as homelessness_reg_fam.

A

homelessness_reg_fam = homelessness.sort_values([‘region’,’family_members’], ascending=[True, False])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Create a DataFrame called individuals that contains only the individuals column of homelessness.

A

individuals = homelessness[“individuals”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Create a DataFrame called state_fam that contains only the state and family_members columns of homelessness, in that order.

A

state_fam = homelessness[[‘state’,’family_members’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Create a DataFrame called ind_state that contains the individuals and state columns of homelessness, in that order.

A

ind_state = homelessness[[‘individuals’,’state’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Filter homelessness for cases where the number of individuals is greater than ten thousand, assigning to ind_gt_10k

A

ind_gt_10k = homelessness[homelessness[‘individuals’]>10000]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Filter homelessness for cases where the USA Census region is “Mountain”, assigning to mountain_reg

A

mountain_reg = homelessness[homelessness[“region”]==
“Mountain”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Filter homelessness for cases where the number of family_members is less than one thousand and the region is “Pacific”, assigning to fam_lt_1k_pac

A

fam_lt_1k_pac = homelessness[(homelessness [‘family_members’]<1000) & (homelessness[‘region’]==’Pacific’)]

17
Q

Filter homelessness for cases where the USA census region is “South Atlantic” or it is “Mid-Atlantic”, assigning to south_mid_atlantic

A

south_mid_atlantic = homelessness [(homelessness[‘region’]==’South Atlantic’) | (homelessness[‘region’]==’Mid-Atlantic’)]

18
Q

Filter homelessness for cases where the USA census state is in the list of Mojave states, canu, assigning to mojave_homelessness. Given that
canu = [“California”, “Arizona”, “Nevada”, “Utah”]

A

mojave_homelessness = homelessness
[homelessness[‘state’].isin(canu)]

19
Q

Add a column to homelessness, indiv_per_10k, containing the number of homeless individuals per ten thousand people in each state.

A

homelessness[“indiv_per_10k”] = 10000 * homelessness[‘individuals’] / homelessness[‘state_pop’]

20
Q

Subset rows where indiv_per_10k is higher than 20, assigning to high_homelessness.

A

high_homelessness = homelessness[homelessness[“indiv_per_10k”]>20]

21
Q

Sort high_homelessness by descending indiv_per_10k, assigning to high_homelessness_srt.

A

high_homelessness_srt = high_homelessness.sort_values(‘indiv_per_10k’, ascending=False)

22
Q

Select only the state and indiv_per_10k columns of high_homelessness_srt and save as result. Look at the result.

A

result = high_homelessness_srt
[[‘state’,’indiv_per_10k’]]
print(result)

23
Q
A