Pandas Pt 3 (UCSD) Flashcards
how do you stack dataframes left and right vertically
pd.concat( [ left, right] ) ## will end up w/ 1 df w/ all of the unique cols of the original, and new rows
how do you do an inner join on dataframes left and right (but columns repeated)
pd.concat( [ left, right] ), axis =1, join=’inner’ ) ## end up repeating duplicate columns
what’s another means of vertical stacking left and right dataframes, using append?
left.append(right)
how do you merge left and right like a join, w/o repeating the columns
pd.merge (left, right, how=’inner’ )
merge movies and tags dataframe, on movie ID, inner join
t = movies.merge(tags, on=’movieId’, how=’inner’)
get the first five rows of df that match bool filter1 and bool filter2
df[ filter1 & filter2 ][ :5 ]
split the string values in col ‘city’ in a df using a ‘_’ separator
df [‘city’].str.split(‘_’) ## replaces the values in that column w/ lists separated on
check if any value in the city col of df contains the substring ‘2’
df[ ‘city’ ].str.contains(‘2’)
df.str.func() to replace substring
df.str.replace( subToReplace, replacementSub)
df.srt function to return the values matched by a regex
df[ colName ].str.extract( ‘ regex ‘ ) ## looks like it returns a sliced df, or a series
use split to break values out into new columns
df.str.split( separator, expand = TRUE )
what is unix / posix / epoch
counts the number of seconds since 1970 as per UTC time zone
what is datatime64[ns]
standard python format you can use to compare times ## df[‘time’] > ‘2020-01-01’
convert from unix time to datetime64 format
pd.to_datetime(tags[ ‘timestampCol’ ], unit = ‘s’ ) ## unit refers to seconds
sort values by a parsed time column in a df
df.sort_values(by = ‘parsedTimeCol’, ascending = True)