Uniqueness Constraints Flashcards
What is a list that contains duplicates when there are at least two identical elements in the list?
duplicated values
How do you find duplicated values?
dataframe. duplicated()
1. subset: list of column names to check for duplication
2. keep: weather to keep first (‘first’), last (‘last’) name or all (False) duplicate values
duplicates= dataframe.duplicated(subset = column, keep= False)
3.dataframe[duplicates]. sort_values(by= first name)
What’s a built-in function that is available in a standard installation of Python. with no additional arguments or parameters, is ordering the values in numbers in ascending order, meaning smallest to largest?
.sort_values
How do you drop a duplicate?
dataframe. drop_duplicated()
1. subset: list of column names to check for duplication
2. keep: weather to keep first (‘first’), last (‘last’) name or all (False) duplicate values
duplicates= dataframe.duplicated(subset = column, keep= False)
- inpalce: drop rows directly into the Dataframe without creating new object (True)
dataframe. drop_duplicates(inplace= True) - dataframe[duplicates]. sort_values(by= first name)
What operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
.groupby()
dataframe.groupby(by= colmunn)
What function is used to apply some aggregation across one or more column. Aggregate using callable, string, dict, or list of string/callables. Most frequently used aggregations are:
sum: Return the sum of the values for the requested axis
min: Return the minimum of the values for the requested axis
max: Return the maximum of the values for the requested axis
.agg()