Pandas Flashcards

Question 1

Q

Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences

Answer

A

Axis values can have string labels, not just numeric ones.

Dataframes can contain columns with multiple data types: including integer, float, and string.

Question 2

Q

Read CSV with pandas

Answer

A

import pandas as pd

f500 = pd.read_csv(‘f500.csv’,index_col=0)

Question 3

Q

DataFrame.shape attribute to assign the shape of f500 to f500_shape.

Answer

A

f500_shape = f500.shape

Question 4

Q

Use Python’s type() function to assign the type of f500 to f500_type.

Answer

A

f500_type = type(f500)

Question 5

Q

se the head() method to select the first 6 rows of f500. Assign the result to f500_head.

Answer

A

f500_head = f500.head(6)

Question 6

Q

Use the tail() method to select the last 8 rows of f500. Assign the result to f500_tail.

Answer

A

f500_tail = f500.tail(8)

Question 7

Q

Use the DataFrame.info() method to display information about the f500 dataframe.

Answer

A

f500.info()

Question 8

Q

Select the industry column of f500. Assign the result to the variable name industries.

Answer

A

industries=f500[“industry”]

Question 9

Q

In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.

Answer

A

revenues_years=f500[[“revenues”, “years_on_global_500_list”]]

Question 10

Q

In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.

Answer

A

ceo_to_sector=f500.loc[:,”ceo”:”sector”]

Question 11

Q

By selecting data from f500:
Create a new variable toyota, with:
Just the row with index Toyota Motor.
All columns.

Answer

A

toyota=f500.loc[“Toyota Motor”]

Question 12

Q

By selecting data from f500: Create a new variable, drink_companies, with:
Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
All columns.

Answer

A

drink_companies=f500.loc[[“Anheuser-Busch InBev”, “Coca-Cola”, “Heineken Holding”]]

Question 13

Q

By selecting data from f500: Create a new variable, middle_companies with All rows with indicies from Tata Motorsto Nationwide, inclusive.
All columns from rank to country, inclusive.

Answer

A

middle_companies=f500.loc[“Tata Motors” : “Nationwide”, “rank”: “country”]

Question 14

Q

We’ve already saved a selection of data from f500 to a dataframe named f500_sel.

Find the counts of each unique value in the country column in the f500_sel dataframe.
Select the country column in the f500_sel dataframe. Assign it to a variable named countries.
Use the Series.value_counts() method to return the value counts for countries. Assign the results to country_counts.

Answer

A

countries=f500_sel[“country”]

country_counts=countries.value_counts()

print(country_counts)

Question 15

Q

From the pandas series countries_counts:

Select the item at index label India. Assign the result to the variable name india.

Answer

A

countries = f500['country']
countries_counts = countries.value_counts()

india = countries_counts.loc[“India”]

Question 16

Q

From the pandas series countries_counts: In order, select the items with index labels USA, Canada, and Mexico. Assign the result to the variable name north_america.

Answer

A

north_america= countries_counts[[“USA”,”Canada”,”Mexico”]]

Question 17

Q

By selecting data from f500:

Create a new variable big_movers, with:
Rows with indices Aviva, HP, JD.com, and BHP Billiton, in that order.
The rank and previous_rank columns, in that order.

Answer

A

big_movers = f500.loc[[“Aviva”, “HP”, “JD.com”, “BHP Billiton”], [“rank”,”previous_rank”]]

Question 18

Q

By selecting data from f500:

Create a new variable, bottom_companies with:
All rows with indices from National Gridto AutoNation, inclusive.
The rank, sector, and country columns.

Answer

A

bottom_companies = f500.loc[“National Grid”:”AutoNation”, [“rank”,”sector”,”country”]]

Question 19

Q

in f500 Subtract the values in the rank column from the values in the previous_rank column. Assign the result to rank_change.

Answer

A

rank_change=f500[“previous_rank”]-f500[“rank”]

Question 20

Q

Use the Series.max() method to find the maximum value for the rank_change series. Assign the result to the variable rank_change_max.

Answer

A

rank_change =  f500["previous_rank"] - f500["rank"]
rank_change_max = rank_change.max()

Question 21

Q

Use the Series.min() method to find the minimum value for the rank_change series. Assign the result to the variable rank_change_min.

Answer

A

rank_change_min = rank_change.min()

Question 22

Q

Return a series of descriptive statistics for the rank column in f500.
Select the rank column. Assign it to a variable named rank.
Use the Series.describe() method to return a series of statistics for rank. Assign the result to rank_desc.

Answer

A

rank = f500["rank"]
rank_desc = rank.describe()

Question 23

Q

Use Series.value_counts() and Series.loc to return the number of companies with a value of 0 in the previous_rank column in the f500 dataframe. Assign the results to zero_previous_rank.

Answer

A

zero_previous_rank = f500[“previous_rank”].value_counts().loc[0]

Question 24

Q

Use the DataFrame.max() method to find the maximum value for only the numeric columns from f500 (you may need to check the documentation). Assign the result to the variable max_f500.

Answer

A

max_f500 = f500.max(numeric_only=True)

Question 25

Q

f500_desc = f500.describe()

Answer

A

Return a dataframe of descriptive statistics for all of the numeric columns in f500. Assign the result to f500_desc.

Question 26

Q

The company “Dow Chemical” has named a new CEO. Update the value where the row label is Dow Chemical and for the ceo column to Jim Fitterling in the f500 dataframe.

Answer

A

f500.loc[“Dow Chemical”,”ceo”] = “Jim Fitterling”

Question 27

Q

Create a boolean series, motor_bool, that compares whether the values in the industry column from the f500 dataframe are equal to “Motor Vehicles and Parts”

Answer

A

motor_bool = f500[“industry”] == “Motor Vehicles and Parts”

Question 28

Q

Use the motor_bool boolean series to index the country column. Assign the result to motor_countries.

Answer

A

motor_countries = f500.loc[motor_bool, “country”]

Question 29

Q

Use boolean indexing to update values in the previous_rank column of the f500 dataframe:
There should now be a value of np.nan where there previously was a value of 0.

import numpy as np
prev_rank_before = f500[“previous_rank”].value_counts(dropna=False).head()

Answer

A

f500.loc[f500[“previous_rank”] == 0, “previous_rank”] = np.nan

Question 30

Q

Create a new pandas series, prev_rank_after, using the same syntax that was used to create the prev_rank_before series.

Answer

A

prev_rank_after = f500[“previous_rank”].value_counts(dropna=False).head()

Question 31

Q

Add a new column named rank_change to the f500 dataframe by subtracting the values in the rank column from the values in the previous_rank column.

Answer

A

f500["rank_change"] = f500["previous_rank"] - f500["rank"]
rank_change_desc = f500["rank_change"].describe()

Question 32

Q

industry_usa = f500[“industry”][f500[“country”] == “USA”].value_counts().head(2)

Answer

A

sector_china = f500[“sector”][f500[“country”] == “China”].value_counts().head(3)