Pandas Python Flashcards
Example for Creating Pandas Data frame using lists
import pandas as pd # list of strings lst = [‘Geeks’, ‘For’, ‘Geeks’, ‘is’, ‘portal’, ‘for’, ‘Geeks’] # Calling DataFrame constructor on list df = pd.DataFrame(lst) print(df)
Creating Pandas data frame for dictionary of array or lists
import pandas as pd # intialise data of lists. data = {‘Name’:[‘Tom’, ‘nick’, ‘krish’, ‘jack’], ‘Age’:[20, 21, 19, 18]} # Create DataFrame df = pd.DataFrame(data) # Print the output. print(df)
Line for creating Pandas data frame
df = pd.DataFrame(lst) (or) df = pd.DataFrame(dicr) you cant create data frame for dict or list
Line for selecting 2 columns based on labels
print(df[[‘Name’, ‘Qualification’]]) Where Name and qualification are labels
Full code for selecting 2 columns out of several
import pandas as pd # Define a dictionary containing employee data data = {‘Name’:[‘Jai’, ‘Princi’, ‘Gaurav’, ‘Anuj’], ‘Age’:[27, 24, 22, 32], ‘Address’:[‘Delhi’, ‘Kanpur’, ‘Allahabad’, ‘Kannauj’], ‘Qualification’:[‘Msc’, ‘MA’, ‘MCA’, ‘Phd’]} # Convert the dictionary into DataFrame df = pd.DataFrame(data) # select two columns print(df[[‘Name’, ‘Qualification’]])
Line for reading CSV file while using one column as differentiator
data = pd.read_csv(“nba.csv”, index_col =”Name”)
Line for reading CSV using one column and then printing full details of it .
making data frame from csv file data = pd.read_csv(“nba.csv”, index_col =”Name”) # retrieving row by loc method first = data.loc[“Avery Bradley”] print(first)
Code for prinitng Unique constraint(name) and correponding attribute(age)
importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv(“nba.csv”, index_col =”Name”) # retrieving columns by indexing operator first = data[“Age”] print(first)
Why is it nesescary to mention index_col while reading CSV file pd.read_csv(“nba.csv”, index_col =”Name”) print
453 26.0 454 24.0 455 26.0 456 26.0 Because if we dont do that , then it wont recognise entries based on names but rather serial numbers
What would be the output if you you try to call a column that ISNT unique in the CSV file Ex: data = pd.read_csv(“nba.csv”, index_col =”Team”)
Instead of getting Avery Bradley 25.0 Jae Crowder 25.0 John Holland 27.0 R.J. Hunter 22.0 You will get something like Boston Celtics 25.0 Boston Celtics 25.0 Boston Celtics 27.0 Boston Celtics 22.0