Pandas Flashcards
Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences
Axis values can have string labels, not just numeric ones.
Dataframes can contain columns with multiple data types: including integer, float, and string.
Read CSV with pandas
import pandas as pd
f500 = pd.read_csv(‘f500.csv’,index_col=0)
DataFrame.shape attribute to assign the shape of f500 to f500_shape.
f500_shape = f500.shape
Use Python’s type() function to assign the type of f500 to f500_type.
f500_type = type(f500)
se the head() method to select the first 6 rows of f500. Assign the result to f500_head.
f500_head = f500.head(6)
Use the tail() method to select the last 8 rows of f500. Assign the result to f500_tail.
f500_tail = f500.tail(8)
Use the DataFrame.info() method to display information about the f500 dataframe.
f500.info()
Select the industry column of f500. Assign the result to the variable name industries.
industries=f500[“industry”]
In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.
revenues_years=f500[[“revenues”, “years_on_global_500_list”]]
In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.
ceo_to_sector=f500.loc[:,”ceo”:”sector”]
By selecting data from f500:
Create a new variable toyota, with:
Just the row with index Toyota Motor.
All columns.
toyota=f500.loc[“Toyota Motor”]
By selecting data from f500: Create a new variable, drink_companies, with:
Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
All columns.
drink_companies=f500.loc[[“Anheuser-Busch InBev”, “Coca-Cola”, “Heineken Holding”]]
By selecting data from f500: Create a new variable, middle_companies with All rows with indicies from Tata Motorsto Nationwide, inclusive.
All columns from rank to country, inclusive.
middle_companies=f500.loc[“Tata Motors” : “Nationwide”, “rank”: “country”]
We’ve already saved a selection of data from f500 to a dataframe named f500_sel.
Find the counts of each unique value in the country column in the f500_sel dataframe.
Select the country column in the f500_sel dataframe. Assign it to a variable named countries.
Use the Series.value_counts() method to return the value counts for countries. Assign the results to country_counts.
countries=f500_sel[“country”]
country_counts=countries.value_counts()
print(country_counts)
From the pandas series countries_counts:
Select the item at index label India. Assign the result to the variable name india.
countries = f500['country'] countries_counts = countries.value_counts()
india = countries_counts.loc[“India”]