Pandas Flashcards

1
Q

Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences

A

Axis values can have string labels, not just numeric ones.

Dataframes can contain columns with multiple data types: including integer, float, and string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Read CSV with pandas

A

import pandas as pd

f500 = pd.read_csv(‘f500.csv’,index_col=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DataFrame.shape attribute to assign the shape of f500 to f500_shape.

A

f500_shape = f500.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Use Python’s type() function to assign the type of f500 to f500_type.

A

f500_type = type(f500)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

se the head() method to select the first 6 rows of f500. Assign the result to f500_head.

A

f500_head = f500.head(6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Use the tail() method to select the last 8 rows of f500. Assign the result to f500_tail.

A

f500_tail = f500.tail(8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Use the DataFrame.info() method to display information about the f500 dataframe.

A

f500.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Select the industry column of f500. Assign the result to the variable name industries.

A

industries=f500[“industry”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.

A

revenues_years=f500[[“revenues”, “years_on_global_500_list”]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.

A

ceo_to_sector=f500.loc[:,”ceo”:”sector”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

By selecting data from f500:
Create a new variable toyota, with:
Just the row with index Toyota Motor.
All columns.

A

toyota=f500.loc[“Toyota Motor”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

By selecting data from f500: Create a new variable, drink_companies, with:
Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
All columns.

A

drink_companies=f500.loc[[“Anheuser-Busch InBev”, “Coca-Cola”, “Heineken Holding”]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

By selecting data from f500: Create a new variable, middle_companies with All rows with indicies from Tata Motorsto Nationwide, inclusive.
All columns from rank to country, inclusive.

A

middle_companies=f500.loc[“Tata Motors” : “Nationwide”, “rank”: “country”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

We’ve already saved a selection of data from f500 to a dataframe named f500_sel.

Find the counts of each unique value in the country column in the f500_sel dataframe.
Select the country column in the f500_sel dataframe. Assign it to a variable named countries.
Use the Series.value_counts() method to return the value counts for countries. Assign the results to country_counts.

A

countries=f500_sel[“country”]

country_counts=countries.value_counts()

print(country_counts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

From the pandas series countries_counts:

Select the item at index label India. Assign the result to the variable name india.

A
countries = f500['country']
countries_counts = countries.value_counts()

india = countries_counts.loc[“India”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

From the pandas series countries_counts: In order, select the items with index labels USA, Canada, and Mexico. Assign the result to the variable name north_america.

A

north_america= countries_counts[[“USA”,”Canada”,”Mexico”]]

17
Q

By selecting data from f500:

Create a new variable big_movers, with:
Rows with indices Aviva, HP, JD.com, and BHP Billiton, in that order.
The rank and previous_rank columns, in that order.

A

big_movers = f500.loc[[“Aviva”, “HP”, “JD.com”, “BHP Billiton”], [“rank”,”previous_rank”]]

18
Q

By selecting data from f500:

Create a new variable, bottom_companies with:
All rows with indices from National Gridto AutoNation, inclusive.
The rank, sector, and country columns.

A

bottom_companies = f500.loc[“National Grid”:”AutoNation”, [“rank”,”sector”,”country”]]

19
Q

in f500 Subtract the values in the rank column from the values in the previous_rank column. Assign the result to rank_change.

A

rank_change=f500[“previous_rank”]-f500[“rank”]

20
Q

Use the Series.max() method to find the maximum value for the rank_change series. Assign the result to the variable rank_change_max.

A
rank_change =  f500["previous_rank"] - f500["rank"]
rank_change_max = rank_change.max()
21
Q

Use the Series.min() method to find the minimum value for the rank_change series. Assign the result to the variable rank_change_min.

A

rank_change_min = rank_change.min()

22
Q

Return a series of descriptive statistics for the rank column in f500.
Select the rank column. Assign it to a variable named rank.
Use the Series.describe() method to return a series of statistics for rank. Assign the result to rank_desc.

A
rank = f500["rank"]
rank_desc = rank.describe()
23
Q

Use Series.value_counts() and Series.loc to return the number of companies with a value of 0 in the previous_rank column in the f500 dataframe. Assign the results to zero_previous_rank.

A

zero_previous_rank = f500[“previous_rank”].value_counts().loc[0]

24
Q

Use the DataFrame.max() method to find the maximum value for only the numeric columns from f500 (you may need to check the documentation). Assign the result to the variable max_f500.

A

max_f500 = f500.max(numeric_only=True)

25
Q

f500_desc = f500.describe()

A

Return a dataframe of descriptive statistics for all of the numeric columns in f500. Assign the result to f500_desc.

26
Q

The company “Dow Chemical” has named a new CEO. Update the value where the row label is Dow Chemical and for the ceo column to Jim Fitterling in the f500 dataframe.

A

f500.loc[“Dow Chemical”,”ceo”] = “Jim Fitterling”

27
Q

Create a boolean series, motor_bool, that compares whether the values in the industry column from the f500 dataframe are equal to “Motor Vehicles and Parts”

A

motor_bool = f500[“industry”] == “Motor Vehicles and Parts”

28
Q

Use the motor_bool boolean series to index the country column. Assign the result to motor_countries.

A

motor_countries = f500.loc[motor_bool, “country”]

29
Q

Use boolean indexing to update values in the previous_rank column of the f500 dataframe:
There should now be a value of np.nan where there previously was a value of 0.

import numpy as np
prev_rank_before = f500[“previous_rank”].value_counts(dropna=False).head()

A

f500.loc[f500[“previous_rank”] == 0, “previous_rank”] = np.nan

30
Q

Create a new pandas series, prev_rank_after, using the same syntax that was used to create the prev_rank_before series.

A

prev_rank_after = f500[“previous_rank”].value_counts(dropna=False).head()

31
Q

Add a new column named rank_change to the f500 dataframe by subtracting the values in the rank column from the values in the previous_rank column.

A
f500["rank_change"] = f500["previous_rank"] - f500["rank"]
rank_change_desc = f500["rank_change"].describe()
32
Q

industry_usa = f500[“industry”][f500[“country”] == “USA”].value_counts().head(2)

A

sector_china = f500[“sector”][f500[“country”] == “China”].value_counts().head(3)