Pandas Flashcards

1
Q

How to check the first 5 rows of df?

A

df.head( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to load a csv file (with ; delimiter)

A

df = pd. read_csv (“filename.csv”, delimiter = “;” )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to get number of rows and columns?

A

df. shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to get summary statistics of numerical variables?

A

df. describe( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to check for missing values?

A

df. isna( ).sum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to get only rows where the neighborhood is Manhatten?

A

subset = df [df [ “neighborhood”] == “Manhattan”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to get only rows where room type is private and and neighborhood is Brooklyn?

A

subset = df [(df [“room_type”] == “Private room”) & (df[“neighborhood”] == “Brooklyn”)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to sort rows by price in descending order?

A

sorted_df = df.sort_values( by=”price”, ascending=False )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to create a new column price_per_night by dividing price by minimum_nights

A

df [“price_per_night”] = df [“price”] / df [“minimum_nights”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to convert ‘last_review’ to datetime format?

A

df[“lastreview”] = pd.todatetime(df[“lastreview”], errors=”coerce”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to extract the year from “last_review”?

A

df [“review_year”] = df [“last_review”].dt.year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to find the average price of listings per neighbourhood_group?

A

avg_df = df.groupby( “neighbourhood_group” )[“price”].mean().reset_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to fill missing values in ‘reviews_per_month’ with 0?

A

df [“reviews_per_month”] = df [“reviews_per_month”].fillna(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to drop rows where price is missing?

A

df = df.dropna(subset = [“price”])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to group by neighborhood and count the listings?

A

df_counts = airbnb.groupby (“neighbourhood_group”).size().reset_index (name=”num_listings”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to find row with most expensive price - returns ID?

A

most_exp = df [“price”]. idxmax( )

17
Q

How to access multiple columns?

A

print(df[[“var1”, “var2”, “var3”]])

18
Q

How to count unique room types?

A

print(df[“room_type”].nunique( ))

19
Q

How to see last 5 rows?

A

print(df[-5:])

20
Q

How to show all rows where neighborhood is Manhatten using .loc ?

A

print(df.loc[df[“neighbourhood”] == “Midtown”])

21
Q

How to retrieve only the name, room_type, and price columns for all listings in Brooklyn?

A

print( df.loc [df [“neighbourhood_group”] == “Brooklyn”, [“name”, “room_type”, “price”]])

22
Q

How to get rows 3 to 7 and columns 1 to 4 using .iloc ?

A

print(df.iloc[3:7, 1:5])

23
Q

How to get all rows, selected columns, i.e. name / price ?

A

df.loc[:, [“name”, “price”]]

24
Q

How to sort df by neighborhood ascending and price descending?

A

df.sort_values([“neighbourhood_group”, “price”], ascending=[True, False])

25
Q

How to drop rows where price is missing?

A

df.dropna(subset=[“price”], inplace=True)

26
Q

How to remove rows with missing values?

A

df.dropna()

27
Q

How to create a new var where price over 100 is labelled high and everything else is low?

A

df [“price_category”] = df [“price”].apply (lambda x: “High” if x > 100 else “Low”)

28
Q

How to get the number of values for each neighborhood group?

A

df[“neighbourhood_group”].value_counts()