Panda Joins Flashcards
What is the main library in Python for data manipulation and analysis?
Pandas
What is a join operation in pandas?
Join operation in pandas is a way to combine two or more dataframes based on a related column between them.
What function is used to perform SQL-like joins in pandas?
merge() function
What is the syntax for the merge function in pandas?
merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False)
What does the ‘on’ parameter in the merge function do?
The ‘on’ parameter specifies the column or columns on which to perform the join.
What does the ‘how’ parameter in the merge function do?
The ‘how’ parameter specifies the type of join to be performed: ‘left’, ‘right’, ‘outer’, ‘inner’.
What does a ‘left’ join do in pandas?
A ‘left’ join in pandas returns all the rows from the left dataframe and the matched rows from the right dataframe. If there is no match, the result is NaN.
What does a ‘right’ join do in pandas?
A ‘right’ join in pandas returns all the rows from the right dataframe and the matched rows from the left dataframe. If there is no match, the result is NaN.
What does an ‘inner’ join do in pandas?
An ‘inner’ join in pandas returns the rows that have matching values in both dataframes.
What does an ‘outer’ join do in pandas?
An ‘outer’ join in pandas returns all rows from both dataframes. If there is no match, the result is NaN.
What is a left_on and right_on parameters in the merge function?
The left_on and right_on parameters allow you to specify the columns to join on if they have different names in the two dataframes.
What does the sort parameter in the merge function do?
The sort parameter sorts the result dataframe by the join keys in lexicographical order. Default is False.
What function is used to combine Series or DataFrame objects with a set of key(s) in pandas?
concat() function
What does the join_axes parameter in the concat function do?
The join_axes parameter is deprecated since pandas 0.25.0. Use .reindex or .reindex_like on the result to achieve the same functionality.
What is the syntax for the concat function in pandas?
concat(objs, axis=0, join=’outer’, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
What does the ‘axis’ parameter in the concat function do?
The ‘axis’ parameter specifies the axis to concatenate along. 0 is for index (rows) and 1 is for columns.
What does the ‘ignore_index’ parameter in the concat function do?
The ‘ignore_index’ parameter, if True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, …, n - 1.
What is the main difference between merge and concat in pandas?
merge is used to combine dataframes based on a key/column, whereas concat is used to append dataframes along a particular axis.
What is the purpose of the keys parameter in the concat function?
The keys parameter is used to construct hierarchical index using the passed keys as the outermost level.
What does the ‘verify_integrity’ parameter in the concat function do?
The ‘verify_integrity’ parameter, if True, checks whether the new concatenated axis contains duplicates. If it does, it will raise an exception.
What is the join function in pandas?
The join function is used to combine columns of two potentially differently-indexed dataframes into a single dataframe.
What is the syntax for the join function in pandas?
join(self, other, on=None, how=’left’, lsuffix=’’, rsuffix=’’, sort=False)
What does the ‘lsuffix’ and ‘rsuffix’ parameters in the join function do?
The ‘lsuffix’ and ‘rsuffix’ parameters are suffixes to add to overlapping column names in the left and the right side, respectively.
How can you merge two dataframes df1 and df2 on a column ‘id’ using an inner join?
df1.merge(df2, on=’id’, how=’inner’)
How can you concatenate two dataframes df1 and df2 along the column axis?
pd.concat([df1, df2], axis=1)
How can you join two dataframes df1 and df2 using the indexes?
df1.join(df2)
When would you use merge over join in pandas?
When you need to combine dataframes based on a key/column rather than their index.
When would you use join over merge in pandas?
When you want to combine dataframes based on their index rather than a key/column.
When would you use concat over merge or join in pandas?
When you want to append dataframes along a particular axis (either rows or columns) rather than combining them based on a key or index.
What does a suffix do when merging two dataframes with overlapping column names in pandas?
A suffix is added to the overlapping column names to maintain their identity after the merge.
How can you add a suffix when merging two dataframes df1 and df2 on a column ‘id’?
df1.merge(df2, on=’id’, suffixes=(‘_df1’, ‘_df2’))
What happens if you try to merge two dataframes with different shapes in pandas?
You can merge dataframes with different shapes. The resulting dataframe’s shape will depend on the type of join used and the data in the dataframes.
How can you perform an outer join on two dataframes df1 and df2 on a column ‘id’?
df1.merge(df2, on=’id’, how=’outer’)
What is the difference between a one-to-one, many-to-one, and many-to-many join in pandas?
In a one-to-one join, each row of the first dataframe is merged with one row of the second dataframe. In a many-to-one join, each row of the first dataframe is merged with multiple rows of the second dataframe. In a many-to-many join, multiple rows of the first dataframe are merged with multiple rows of the second dataframe.