Columns Flashcards
What are the different ways you can select columns from customerDf
.select("column name") .select('column_name) .select($"column_name") .select(col("column_name") .select(column("column_name")) .select(customerDf.col("column_name"))
How do you combine two columns
.select(expr(“concat(firstname, lastname) name”))
Is there an overload that takes a string and a column object
No, you cannot do variations of column objects, but you can’t mix strings and column objects
How can you select columns using sql
.selectExpr(“column”)
With SQL, how do you get and rename a column
.selectExpr(“birthdate birthday”)
How do you see all the columns for a DataFrame
customerDf.columns
How do you rename a column in the data frame
.withColumnRenamed(“old_name”, “new name”)
If you rename a column that does not exist, spark will fail
False, it will succeed but do nothing.
can columnRenamed take in column objects
No, strings only
How do you print the schema of the data frame
.printSchema
How can you change a datatype of a column not using apache spark types
.select($”column_object”.cast(“long”))
How can you change the data type of the column using apache spark types
import org.apache.spark.sql.types._ (the _ means all tyoes)
.select($”column_object”.cast(StringType))
How can you change the data type of a column using a select expression
.selectExpr(“cast(complex_object.property_in_there[0]) as double) rename_if_want”)
example is a changing the property of a complex type and renaming it. The same works with any column
How can you add a column, make it of two existing columns with a space
.withColumn(“new_column_name”, concat_ws(“ “, $”first_column”, $”second_column”))
How can you remove a column using a string
.drop(‘‘column_name”)
How do you remove multiple columns using string
.drop(“column1”, “column2”)
Can the remove column function take column objects
Yes
What are the different ways you can use column objects
'column_name $"column_name" col("column_name" column("column_name") customerDf.col("column_name")
Using column objects, how to you remove multiple columns
You can’t. With column objects you can only remove one a. time
How do you create a new column by multiplying two other columns together
.withColumn(“column_name”, $”column_a” * $”column_b”)
How do you create a new column by dividing two other columns, using expression
.withColumn(“new_column”, expr(“column1 / column2”))
not required to use expression, just how it would work
How do you create a new column by rounding another column
withColumn(“new_column”, round($”column_name”, 2))
with .toDf how can you specify columns
.toDf(“column_name”)
.withColumn will put a column where
at the end