Pandas Flashcards
How to check the first 5 rows of df?
df.head( )
How to load a csv file (with ; delimiter)
df = pd. read_csv (“filename.csv”, delimiter = “;” )
How to get number of rows and columns?
df. shape
How to get summary statistics of numerical variables?
df. describe( )
How to check for missing values?
df. isna( ).sum
How to get only rows where the neighborhood is Manhatten?
subset = df [df [ “neighborhood”] == “Manhattan”]
How to get only rows where room type is private and and neighborhood is Brooklyn?
subset = df [(df [“room_type”] == “Private room”) & (df[“neighborhood”] == “Brooklyn”)]
How to sort rows by price in descending order?
sorted_df = df.sort_values( by=”price”, ascending=False )
How to create a new column price_per_night by dividing price by minimum_nights
df [“price_per_night”] = df [“price”] / df [“minimum_nights”]
How to convert ‘last_review’ to datetime format?
df[“lastreview”] = pd.todatetime(df[“lastreview”], errors=”coerce”)
How to extract the year from “last_review”?
df [“review_year”] = df [“last_review”].dt.year
How to find the average price of listings per neighbourhood_group?
avg_df = df.groupby( “neighbourhood_group” )[“price”].mean().reset_index()
How to fill missing values in ‘reviews_per_month’ with 0?
df [“reviews_per_month”] = df [“reviews_per_month”].fillna(0)
How to drop rows where price is missing?
df = df.dropna(subset = [“price”])
How to group by neighborhood and count the listings?
df_counts = airbnb.groupby (“neighbourhood_group”).size().reset_index (name=”num_listings”)
How to find row with most expensive price - returns ID?
most_exp = df [“price”]. idxmax( )
How to access multiple columns?
print(df[[“var1”, “var2”, “var3”]])
How to count unique room types?
print(df[“room_type”].nunique( ))
How to see last 5 rows?
print(df[-5:])
How to show all rows where neighborhood is Manhatten using .loc ?
print(df.loc[df[“neighbourhood”] == “Midtown”])
How to retrieve only the name, room_type, and price columns for all listings in Brooklyn?
print( df.loc [df [“neighbourhood_group”] == “Brooklyn”, [“name”, “room_type”, “price”]])
How to get rows 3 to 7 and columns 1 to 4 using .iloc ?
print(df.iloc[3:7, 1:5])
How to get all rows, selected columns, i.e. name / price ?
df.loc[:, [“name”, “price”]]
How to sort df by neighborhood ascending and price descending?
df.sort_values([“neighbourhood_group”, “price”], ascending=[True, False])
How to drop rows where price is missing?
df.dropna(subset=[“price”], inplace=True)
How to remove rows with missing values?
df.dropna()
How to create a new var where price over 100 is labelled high and everything else is low?
df [“price_category”] = df [“price”].apply (lambda x: “High” if x > 100 else “Low”)
How to get the number of values for each neighborhood group?
df[“neighbourhood_group”].value_counts()