Rows Flashcards
How do you filter a dataframe based on values being greater than using where
.where(year($”birthdate”) > 1980)
How do you filter, using filter, to a specific month
.filter(month(`birthdate) === 1)
Using a SQL expression, how do you filter a dataframe
.where(“date(birthdate)>15”)
How do you check for inequality when filtering
=!=
How do you make sure your data frame only has unique values taking all columns into account
.distinct
How can your remove duplicates and only take one column into account
.dropDuplicates(“column”)
How can your remove duplicates and only take multiple columns into account
.dropDuplicates(List(“column1”, “column2”))
if dropDistinct does not have any columns passed in, what columns are taken into account to determine distinct values or does it fail?
All columns, equivalent to .distinct
How can you filter out null values from a dataframe
.where($”column_object”.isNotNull)
How can you drop rows with all null values
.na.drop(how=”all”)
How to remove a row where the any value in the row is null
.na.drop(“any”)
How do you remove the row if two specific columns have nulls
.na.drop(“all”, Seq(“column_a”, “column_b”))
How do you remove a row with a null value in either of two columns
.na.drop(“any”, Seq(“column_a”, “column_b”))
How can you replace all null values with “nope”
.na.fill(“nope”)
True/False
.na.fill(“Nope”) will only replace null where the column type is string
True