Python Pandas Flashcards
Create a dictionary called my_dict with the following three key value pairs:
key ‘country’ and value names.
key ‘drives_right’ and value dr.
key ‘cars_per_cap’ and value cpc.
my_dict={‘country’: names, ‘drives_right’: dr, ‘cars_per_cap’: cpc}
Use pd.read_csv() to import cars.csv data as a DataFrame. Store this DataFrame as cars.
cars = pd.read_csv(‘cars.csv’)
Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels.
cars = pd.read_csv(‘cars.csv’, index_col=0)
Use single square brackets to print out the ‘country’ column of cars as a Pandas Series.
print(cars[‘country’])
Use double square brackets to print out the ‘country’ column of cars as a Pandas DataFrame.
print(cars[[‘country’]])
Use double square brackets to print out a DataFrame with both the ‘country’ and ‘drives_right’ columns of cars, in this order.
print(cars[[‘country’, ‘drives_right’]])
Select the first 3 observations from cars and print them out.
print(cars[0:3])
Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.
print(cars[3:6])
3 ways to inspect a dataframe
Inspect the first few rows (including index labels)
print(df.head())
Inspect the last few rows
print(df.tail())
Inspect random sample rows
print(df.sample(5))
Use loc or iloc to select the observation corresponding to Japan as a Series. The label of this row is JPN, the index is 2. Make sure to print the resulting Series.
print(cars.loc[‘JPN’])
print(cars.iloc[2])
Print out the ‘drives_right’ value of the row corresponding to Morocco (its row label is MOR)
print(cars.loc[‘MOR’, ‘drives_right’])
Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns ‘country’ and ‘drives_right’.
print(cars.loc[[‘RU’, ‘MOR’], [‘country’, ‘drives_right’]])
Print out from the df cars the drives_right column as a Series using loc
print(cars.loc[:,’drives_right’])
Print out the drives_right column as a DataFrame using loc
print(cars.loc[:, [‘drives_right’]])
Print out both the cars_per_cap and drives_right column as a DataFrame using loc
print(cars.loc[:, [‘cars_per_cap’, ‘drives_right’]])
Which areas in my_house are greater than 18.5 or smaller than 10?
print(np.logical_or(my_house > 18.5, my_house < 10))
Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that you can inspect the output.
print(np.logical_and(my_house < 11, your_house < 11))
make an if statement that prints out “looking around in the kitchen.” if room equals “kit”.
if room == “kit” :
print(“looking around in the kitchen.”)
Write another if statement that prints out “big place!” if area is greater than 15.
if area >15:
print(“big place!”)
Extract the drives_right column as a Pandas Series and store it as dr.
dr = cars[‘drives_right’]
Use dr, a boolean Series, to subset the ‘cars’ DataFrame. Store the resulting selection in ‘sel’.
sel = cars[dr]
Select the cars_per_cap column as a Pandas Series and store it as cpc
cpc = cars[‘cars_per_cap’]
Create car_maniac: observations that have a cars_per_cap over 500
cpc = cars[‘cars_per_cap’]
many_cars = cpc > 500 # This creates a boolean Series
print(many_cars)
Create medium: observations with cars_per_cap between 100 and 500
cpc = cars[‘cars_per_cap’]
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]
Write a for loop that iterates over all elements of the areas list (area in ‘areas’df)
for area in areas:
print(area)
write a for loop using enumerate(). Print() so that on each run, a line of the form “room x: y” should be printed, where x is the index of the list element and y is the actual list element, i.e. the area.
for index, area in enumerate(areas) :
print(‘room’+ str(index)+ ‘: ‘ + str(area))
adapt the following so that the first printout becomes “room 1: 11.25”, the second one “room 2: 18.0” and so on:
for index, area in enumerate(areas) :
print(“room” + str(index) + “: “ + str(area))
for index, area in enumerate(areas) :
print(“room” + str(index + 1) + “: “ + str(area))
Write a for loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.
for x in house:
print(“the “ + x[0] + “ is “ + str(x[1]) + “ sqm”)
Write a for loop that goes through each key:value pair of europe. On each iteration, “the capital of x is y” should be printed out, where x is the key and y is the value of the pair.
for key, value in europe.items():
print(“the capital of “ + str(key) + “ is “ + str(value))
Write a for loop that iterates over all elements in np_height and prints out “x inches” for each element, where x is the value in the array.
for x in np_height:
print(str(x) + “ inches”)
Write a for loop that visits every element of the np_baseball array and prints it out.
for x in np.nditer(np_baseball):
print(x)
Write a for loop that iterates over the rows of cars and on each iteration perform two print() calls: one to print out the row label and one to print out all of the rows contents.
for lab. row in cars,iterrows():
print(lab)
print(row)
add the length of the country names of the brics DataFrame in a new column
for lab, row in brics.iterrows() :
brics.loc[lab, “name_length”] = len(row[“country”])
Use a for loop to add a new column, named COUNTRY, that contains a uppercase version of the country names in the “country” column. You can use the string method upper() for this
for lab, row in cars.iterrows():
cars.loc[lab, “COUNTRY”] = row[“country”].upper()
everything in the ‘for loop’ is indented though