Pandas Pt 3 (UCSD) Flashcards

Question 1

Q

how do you stack dataframes left and right vertically

Answer

A

pd.concat( [ left, right] ) ## will end up w/ 1 df w/ all of the unique cols of the original, and new rows

Question 2

Q

how do you do an inner join on dataframes left and right (but columns repeated)

Answer

A

pd.concat( [ left, right] ), axis =1, join=’inner’ ) ## end up repeating duplicate columns

Question 3

Q

what’s another means of vertical stacking left and right dataframes, using append?

Answer

A

left.append(right)

Question 4

Q

how do you merge left and right like a join, w/o repeating the columns

Answer

A

pd.merge (left, right, how=’inner’ )

Question 5

Q

merge movies and tags dataframe, on movie ID, inner join

Answer

A

t = movies.merge(tags, on=’movieId’, how=’inner’)

Question 6

Q

get the first five rows of df that match bool filter1 and bool filter2

Answer

A

df[ filter1 & filter2 ][ :5 ]

Question 7

Q

split the string values in col ‘city’ in a df using a ‘_’ separator

Answer

A

df [‘city’].str.split(‘_’) ## replaces the values in that column w/ lists separated on

Question 8

Q

check if any value in the city col of df contains the substring ‘2’

Answer

A

df[ ‘city’ ].str.contains(‘2’)

Question 9

Q

df.str.func() to replace substring

Answer

A

df.str.replace( subToReplace, replacementSub)

Question 10

Q

df.srt function to return the values matched by a regex

Answer

A

df[ colName ].str.extract( ‘ regex ‘ ) ## looks like it returns a sliced df, or a series

Question 11

Q

use split to break values out into new columns

Answer

A

df.str.split( separator, expand = TRUE )

Question 12

Q

what is unix / posix / epoch

Answer

A

counts the number of seconds since 1970 as per UTC time zone

Question 13

Q

what is datatime64[ns]

Answer

A

standard python format you can use to compare times ## df[‘time’] > ‘2020-01-01’

Question 14

Q

convert from unix time to datetime64 format

Answer

A

pd.to_datetime(tags[ ‘timestampCol’ ], unit = ‘s’ ) ## unit refers to seconds

Question 15

Q

sort values by a parsed time column in a df

Answer

A

df.sort_values(by = ‘parsedTimeCol’, ascending = True)

Pandas Pt 3 (UCSD) Flashcards

(15 cards)