Pandas Flashcards
Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences
Axis values can have string labels, not just numeric ones.
Dataframes can contain columns with multiple data types: including integer, float, and string.
Read CSV with pandas
import pandas as pd
f500 = pd.read_csv(‘f500.csv’,index_col=0)
DataFrame.shape attribute to assign the shape of f500 to f500_shape.
f500_shape = f500.shape
Use Python’s type() function to assign the type of f500 to f500_type.
f500_type = type(f500)
se the head() method to select the first 6 rows of f500. Assign the result to f500_head.
f500_head = f500.head(6)
Use the tail() method to select the last 8 rows of f500. Assign the result to f500_tail.
f500_tail = f500.tail(8)
Use the method to display information about the f500 dataframe.
Select the industry column of f500. Assign the result to the variable name industries.
In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.
revenues_years=f500[[“revenues”, “years_on_global_500_list”]]
In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.
By selecting data from f500:
Create a new variable toyota, with:
Just the row with index Toyota Motor.
All columns.
toyota=f500.loc[“Toyota Motor”]
By selecting data from f500: Create a new variable, drink_companies, with:
Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
All columns.
drink_companies=f500.loc[[“Anheuser-Busch InBev”, “Coca-Cola”, “Heineken Holding”]]
By selecting data from f500: Create a new variable, middle_companies with All rows with indicies from Tata Motorsto Nationwide, inclusive.
All columns from rank to country, inclusive.
middle_companies=f500.loc[“Tata Motors” : “Nationwide”, “rank”: “country”]
We’ve already saved a selection of data from f500 to a dataframe named f500_sel.
Find the counts of each unique value in the country column in the f500_sel dataframe.
Select the country column in the f500_sel dataframe. Assign it to a variable named countries.
Use the Series.value_counts() method to return the value counts for countries. Assign the results to country_counts.
From the pandas series countries_counts:
Select the item at index label India. Assign the result to the variable name india.
countries = f500['country'] countries_counts = countries.value_counts()
india = countries_counts.loc[“India”]